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Preface 


In the fine arts, a master class is a small class where students and coaches 
work together to support a high level of technical and creative excellence. 
This book tries to capture the spirit of a master class while providing 
coaching for readers who want to refine their skills as solvers of problems, 
especially those problems dealing with mathematical inequalities. 

The most important prerequisite for benefiting from this book is the 
desire to master the craft of discovery and proof. The formal require- 
ments are quite modest. Anyone with a solid course in calculus is well 
prepared for almost everything to be found here, and perhaps half of the 
material does not even require calculus. Nevertheless, the book develops 
many results which are rarely seen, and even experienced readers are 
likely to find material that is challenging and informative. 

With the Cauchy—Schwarz inequality as the initial guide, the reader 
is led through a sequence of interrelated problems whose solutions are 
presented as they might have been discovered — either by one of his- 
tory’s famous mathematicians or by the reader. The problems emphasize 
beauty and surprise, but along the way one finds systematic coverage 
of the geometry of squares, convexity, the ladder of power means, ma- 
jorization, Schur convexity, exponential sums, and all of the so-called 
classical inequalities, including those of Holder, Hilbert, and Hardy. 

To solve a problem is a very human undertaking, and more than a little 
mystery remains about how we best guide ourselves to the discovery of 
original solutions. Still, as George Polya and others have taught us, there 
are principles of problem solving. With practice and good coaching we 
can all improve our skills. Just like singers, actors, or pianists, we have a 
path toward a deeper mastery of our craft. 
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Starting with Cauchy 


Cauchy’s inequality for real numbers tells us that 


aby + agby +++ + andy S fa} tab +--+ a2 \/BR +B + +02, 


and there is no doubt that this is one of the most widely used and most 
important inequalities in all of mathematics. A central aim of this course 
— or master class — is to suggest a path to mastery of this inequality, 
its many extensions, and its many applications — from the most basic 
to the most sublime. 


THE TYPICAL PLAN 


The typical chapter in this course is built around the solution of a 
small set of challenge problems. Sometimes a challenge problem is drawn 
from one of the world’s famous mathematical competitions, but more 
often a problem is chosen because it illustrates a mathematical technique 
of wide applicability. 

Ironically, our first challenge problem is an exception. To be sure, the 
problem hopes to offer honest coaching in techniques of importance, but 
it is unusual in that it asks you to solve a problem that you are likely to 
have seen before. Nevertheless, the challenge is sincere; almost everyone 
finds some difficulty directing fresh thoughts toward a familiar problem. 


Problem 1.1 Prove Cauchy’s inequality. Moreover, if you already know 
a proof of Cauchy’s inequality, find another one! 


COACHING FOR A PLACE TO START 


How does one solve a problem in a fresh way? Obviously there cannot 
be any universal method, but there are some hints that almost always 
help. One of the best of these is to try to solve the problem by means 
of a specific principle or specific technique. 

Here, for example, one might insist on proving Cauchy’s inequality 
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just by algebra — or just by geometry, by trigonometry, or by calculus. 
Miraculously enough, Cauchy’s inequality is wonderfully provable, and 
each of these approaches can be brought to a successful conclusion. 


A PRINCIPLED BEGINNING 


If one takes a dispassionate look at Cauchy’s inequality, there is an- 
other principle that suggests itself. Any time one faces a valid propo- 
sition that depends on an integer n, there is a reasonable chance that 
mathematical induction will lead to a proof. Since none of the standard 
texts in algebra or analysis gives such a proof of Cauchy’s inequality, 
this principle also has the benefit of offering us a path to an “original” 
proof — provided, of course, that we find any proof at all. 

When we look at Cauchy’s inequality for n = 1, we see that the 
inequality is trivially true. This is all we need to start our induction, 
but it does not offer us any insight. If we hope to find a serious idea, 
we need to consider n = 2 and, in this second case, Cauchy’s inequality 
just says 

(aby + agb2)* < (aj + a3) (bj + 63). (1.1) 


This is a simple assertion, and you may see at a glance why it is true. 
Still, for the sake of argument, let us suppose that this inequality is not 
so obvious. How then might one search systematically for a proof? 

Plainly, there is nothing more systematic than simply expanding both 
sides to find the equivalent inequality 


ath? + 2a; b,a2b2 + a2b2 < a2b? + a2b2 + a2b? + 02d, 


then, after we make the natural cancellations and collect terms to one 
side, we see that inequality (1.1) is also equivalent to the assertion that 


0 < (a,b)? — 2(a,b2)(a2b1) + (agb,)”. (1.2) 


This equivalent inequality actually puts the solution of our problem 
within reach. From the well-known factorization x? —2xry+y? = (x—y)? 
one finds 


(a,b2)* — 2(a1b2)(agb1) + (azb,)? = (azb2 = azb,)”, (1.3) 


and the nonnegativity of this term confirms the truth of inequality (1.2). 
By our chain of equivalences, we find that inequality (1.1) is also true, 
and thus we have proved Cauchy’s inequality for n = 2. 

THE INDUCTION STEP 


Now that we have proved a nontrivial case of Cauchy’s inequality, we 
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are ready to look at the induction step. If we let H(n) stand for the 
hypothesis that Cauchy’s inequality is valid for n, we need to show that 
H(2) and H(n) imply H(n+1). With this plan in mind, we do not need 
long to think of first applying the hypothesis H(n) and then using H(2) 
to stitch together the two remaining pieces. Specifically, we have 


aby + agbg + +++ + anbn + Qn41bn41 


= (a 0) + agbo +--+ + Gnbn) + Gn41bn41 

1 1 
< (aj + ag +--+ +05)? (bf +03 +--+ +05)? + angibnti 
Sar a oe na? 4 (be es eee ee 


where in the first inequality we used the induction hypothesis H(n), and 
in the second inequality we used H(2) in the form 


1 1 
a8 + Antibnsi < (a? a ae 4)? (9? a baa)? 


with the new variables 


a=(a2+a2+4+---4+a2)? and B= (b? +02 +4---+62)2. 


The only difficulty one might have finding this proof comes in the 
last step where we needed to see how to use H(2). In this case the 
difficulty was quite modest, yet it anticipates the nature of the challenge 
one finds in more sophisticated problems. The actual application of 
Cauchy’s inequality is never difficult; the challenge always comes from 
seeing where Cauchy’s inequality should be applied and what one gains 
from the application. 


THE PRINCIPLE OF QUALITATIVE INFERENCES 


Mathematical progress depends on the existence of a continuous stream 
of new problems, yet the processes that generate such problems may 
seem mysterious. To be sure, there is genuine mystery in any deeply 
original problem, but most new problems evolve quite simply from well- 
established principles. One of the most productive of these principles 
calls on us to expand our understanding of a quantitative result by first 
focusing on its qualitative inferences. 

Almost any significant quantitative result will have some immediate 
qualitative corollaries and, in many cases, these corollaries can be derived 
independently, without recourse to the result that first brought them to 
light. The alternative derivations we obtain this way often help us to see 
the fundamental nature of our problem more clearly. Also, much more 
often than one might guess, the qualitative approach even yields new 
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quantitative results. The next challenge problem illustrates how these 
vague principles can work in practice. 


Problem 1.2 One of the most immediate qualitative inferences from 
Cauchy’s inequality is the simple fact that 


So ag < oo and S- bz <.0o imply that S- |azb| < 00. (1.4) 
k=1 k=1 k=1 


Give a proof of this assertion that does not call on Cauchy’s inequality. 


When we consider this challenge, we are quickly drawn to the realiza- 
tion that we need to show that the product azb, is small when az and 
be are small. We could be sure of this inference if we could prove the 
existence of a constant C' such that 


ry < C(z? + y’) for all real x, y. 


Fortunately, as soon as one writes down this inequality, there is a good 
chance of recognizing why it is true. In particular, one might draw the 
link to the familiar factorization 


O<(e—y)? =2° —2ay+y’, 
and this observation is all one needs to obtain the bound 
ry < 52 + wv for all real x, y. (1.5) 
Now, when we apply this inequality to x = |a,| and y = |b,| and then 
sum over all k, we find the interesting additive inequality 


S- laxde| < 5 Gets Doe (1.6) 
k=1 k=1 k=1 


This bound gives us another way to see the truth of the qualitative 
assertion (1.4) and, thus, it passes one important test. Still, there are 
other tests to come. 


A TEST OF STRENGTH 


Any time one meets a new inequality, one is almost duty bound to 
test the strength of that inequality. Here that obligation boils down 
to asking how close the new additive inequality comes to matching the 
quantitative estimates that one finds from Cauchy’s inequality. 

The additive bound (1.6) has two terms on the right-hand side, and 
Cauchy’s inequality has just one. Thus, as a first step, we might look 
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for a way to combine the two terms of the additive bound (1.6), and a 
natural way to implement this idea is to normalize the sequences {a,} 
and {b,} so that each of the right-hand sums is equal to one. 

Thus, if neither of the sequences is made up of all zeros, we can intro- 
duce new variables 


an =as/( Da) and ix = m/( 88) 5 
J J 


which are normalized in the sense that 
=> {a/(8) }- 
and 


Dit= > {it/(La) ba 


Now, when we apply inequality (1.6) to the sequences {@,} and {bg}, 
we obtain the simple-looking bound 


oo _ 1 oo e 1 oo a 
k=1 k=1 k=1 


and, in terms of the original sequences {a,} and {b;}, we have 


{ou(Es) }or(Ee) J 


Finally, when we clear the denominators, we find our old friend Cauchy’s 
inequality — though this time it also covers the case of possibly infinite 
sequences: 


Yeas < (Ss) (Sa). (1.7) 


The additive bound (1.6) led us to a proof of Cauchy’s inequality 
which is quick, easy, and modestly entertaining, but it also connects to 
a larger theme. Normalization gives us a systematic way to pass from 
an additive inequality to a multiplicative inequality, and this is a trip 
we will often need to make in the pages that follow. 


ITEM IN THE DOCK: THE CASE OF EQUALITY 


One of the enduring principles that emerges from an examination 
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of the ways that inequalities are developed and applied is that many 
benefits flow from understanding when an inequality is sharp, or nearly 
sharp. In most cases, this understanding pivots on the discovery of the 
circumstances where equality can hold. 

For Cauchy’s inequality this principle suggests that we should ask 
ourselves about the relationship that must exist between the sequences 
{ax} and {b;} in order for us to have 


i 
2 


oo oo 5 oo 
Sao (S-4) (2) , (1.8) 
k=1 k=1 k=1 

If we focus our attention on the nontrivial case where neither of the 
sequences is identically zero and where both of the sums on the right- 
hand side of the identity (1.8) are finite, then we see that each of the 
steps we used in the derivation of the bound (1.7) can be reversed. Thus 
one finds that the identity (1.8) implies the identity 


Yah = 5 a+ 5 BEL (1.9) 


By the two-term bound xy < (a? + y”)/2 , we also know that 


axbp < it + sit for all k = 1,2,..., (1.10) 
and from these we see that if strict inequality were to hold for even one 
value of & then we could not have the equality (1.9). This observation 
tells us in turn that the case of equality (1.8) can hold for nonzero series 
only when we have a, = bi for all k = 1,2,.... By the definition of these 
normalized values, we then see that 


ay =Ab, for all k= 1,2,..., (1.11) 


where the constant is given by the ratio 


(Ea) E9 


j=l 


Here one should note that our argument was brutally straightforward, 
and thus, our problem was not much of a challenge. Nevertheless, the 
result still expresses a minor miracle; the one identity (1.8) has the 
strength to imply an infinite number of identities, one for each value of 
k =1,2,... in equation (1.11). 
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BENEFITS OF GOOD NOTATION 


Sums such as those appearing in Cauchy’s inequality are just barely 
manageable typographically and, as one starts to add further features, 
they can become unwieldy. Thus, we often benefit from the introduction 
of shorthand notation such as 


(a, b) a (1.12) 


where a = (@1,@2,...,@,) and b = (by, b2,...,b,). This shorthand now 
permits us to write Cauchy’s inequality quite succinctly as 


(a,b) < (a,a)?(b,b)?. (1.13) 


Parsimony is fine, but there are even deeper benefits to this notation 
if one provides it with a more abstract interpretation. Specifically, if 
V is a real vector space (such as R%), then we say that a function on 
V x V defined by the mapping (a,b) + (a,b) is an inner product and 
we say that (V, (-,-)) is a real inner product space provided that the pair 
(V, (-,-)) has the following five properties: 


) ¢ for all v € V, 

)¢ if and only if v = 0, 

(iii) (av, w) = a(v,w) for all a € R and all v,we V, 

) (u,v + w) = (u,v) + (u,w) for all u,v,w € V, and finally, 
) (v,w) = (w,v) forallv,weV. 


? 


One can easily check that the shorthand introduced by the sum (1.12) 
has each of these properties, but there are many further examples of use- 
ful inner products. For example, if we fix a set of positive real numbers 
{w; :j =1,2,...,n} then we can just as easily define an inner product 
on R” with the weighted sums 


(a,b) = So ajbjw; (1.14) 
j=l 


and, with this definition, one can check just as before that (a, b) satisfies 
all of the properties that one requires of an inner product. Moreover, this 
example only reveals the tip of an iceberg; there are many useful inner 
products, and they occur in a great variety of mathematical contexts. 
An especially useful example of an inner product can be given by 
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considering the set V = C[a, 6] of real-valued continuous functions on 
the bounded interval [a,b] and by defining (-,-) on V by setting 


b 
(fg) = / f(a)g(e) de, (1.15) 


or more generally, if w : [a,b] > R is a continuous function such that 
w(x) > 0 for all x € [a,b], then one can define an inner product on 
Cla, b] by setting 


b 
(ta) = f f(x)g(x)w(a) dz. 


We will return to these examples shortly, but first there is an opportunity 
that must be seized. 


AN OPPORTUNISTIC CHALLENGE 


We now face one of those pleasing moments when good notation sug- 
gests a good theorem. We introduced the idea of an inner product in 
order to state the basic form (1.7) of Cauchy’s inequality in a simple 
way, and now we find that our notation pulls us toward an interesting 
conjecture: Can it be true that in every inner product space one has the 
inequality (v,w) < (v,v)2(w,w)2? This conjecture is indeed true, and 
when framed more precisely, it provides our next challenge problem. 


Problem 1.3 For any real inner product space (V, (-,-)), one has for all 
v and w in V that 


(v,w) < (v,v)2(w,w)?; (1.16) 


moreover, for nonzero vectors v and w, one has 


i 
2 


(v,w) = (vv)? (w,w) if and only if v = Aw 


for a nonzero constant X. 


As before, one may be tempted to respond to this challenge by just 
rattling off a previously mastered textbook proof, but that temptation 
should still be resisted. The challenge offered by Problem 1.3 is impor- 
tant, and it deserves a fresh response — or, at least, a relatively fresh 
response. 

For example, it seems appropriate to ask if one might be able to use 
some variation on the additive method which helped us prove the plain 
vanilla version of Cauchy’s inequality. The argument began with the 
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observation that (2 — y)? > 0 implies xy < x?/2 + y?/2, and one might 
guess that an analogous idea could work again in the abstract case. 
Here, of course, we need to use the defining properties of the inner 
product, and, as we go down the list looking for an analog to (x—y)? > 0, 
we are quite likely to hit on the idea of using property (i) in the form 


(v—w,v—w) >0. 


Now, when we expand this inequality with the help of the other proper- 
ties of the inner product (-,-), we find that 


1 1 
(v, w) < 5 (Vv) + 5 (WW). (1.17) 
This is a perfect analog of the additive inequality that gave us our second 
proof of the basic Cauchy inequality, and we face a classic situation where 
all that remains is a “matter of technique.” 


A RETRACED PASSAGE — CONVERSION OF AN ADDITIVE BOUND 


Here we are oddly lucky since we have developed only one technique 
that is even remotely relevant — the normalization method for convert- 
ing an additive inequality into one that is multiplicative. Normalization 
means different things in different places, but, if we take our earlier anal- 
ysis as our guide, what we want here is to replace v and w with related 
terms that reduce the right side of the bound (1.17) to 1. 

Since the inequality (1.16) holds trivially if either v or w is equal to 
zero, we may assume without loss of generality that (v,v) and (w, w) 
are both nonzero, so the normalized variables 


¥=v/(v,v)? and W=w/(w,w)? (1.18) 


are well defined. When we substitute these values for v and w in the 
bound (1.17), we then find (v, Ww) < 1. In terms of the original variables 
v and w, this tells us (v,w) < (v,v)2(w,w)?, just as we wanted to 
show. 

Finally, to resolve the condition for equality, we only need to exam- 
ine our reasoning in reverse. If equality holds in the abstract Cauchy 
inequality (1.16) for nonzero vectors v and w, then the normalized vari- 
ables V and w are well defined. In terms of the normalized variables, 
the equality of (v,w) and (v,v)?(w,w)? tells us that (¥,w) = 1, and 
this tells us in turn that (¥ — w, ¥ — Ww) = 0 simply by expansion of the 
inner product. From this we deduce that v — w = 0; or, in other words, 
v = \w where we set \ = (v,v)?2/(w,w)?. 
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THE PACE OF SCIENCE — THE DEVELOPMENT OF EXTENSIONS 


Augustin-Louis Cauchy (1789-1857) published his famous inequality 
in 1821 in the second of two notes on the theory of inequalities that 
formed the final part of his book Cours d’Analyse Algébrique, a vol- 
ume which was perhaps the world’s first rigorous calculus text. Oddly 
enough, Cauchy did not use his inequality in his text, except in some 
illustrative exercises. The first time Cauchy’s inequality was applied 
in earnest by anyone was in 1829, when Cauchy used his inequality in 
an investigation of Newton’s method for the calculation of the roots of 
algebraic and transcendental equations. This eight-year gap provides 
an interesting gauge of the pace of science; now, each month, there are 
hundreds — perhaps thousands — of new scientific publications where 
Cauchy’s inequality is applied in one way or another. 

A great many of those applications depend on a natural analog of 
Cauchy’s inequality where sums are replaced by integrals, 


i " f(e)gla) de < (/ P(e) ts) (| ee, an) (1.19) 


This bound first appeared in print in a Mémoire by Victor Yacovlevich 
Bunyakovsky which was published by the Imperial Academy of Sciences 
of St. Petersburg in 1859. Bunyakovsky (1804-1889) had studied in 
Paris with Cauchy, and he was quite familiar with Cauchy’s work on 
inequalities; so much so that by the time he came to write his Mémoire, 
Bunyakovsky was content to refer to the classical form of Cauchy’s in- 
equality for finite sums simply as well-known. Moreover, Bunyakovsky 
did not dawdle over the limiting process; he took only a single line to 
pass from Cauchy’s inequality for finite sums to his continuous analog 
(1.19). By ironic coincidence, one finds that this analog is labelled as in- 
equality (C) in Bunyakovsky’s Mémoire, almost as though Bunyakovsky 
had Cauchy in mind. 

Bunyakovsky’s Mémoire was written in French, but it does not seem 
to have circulated widely in Western Europe. In particular, it does not 
seem to have been known in Gottingen in 1885 when Hermann Amandus 
Schwarz (1843-1921) was engaged in his fundamental work on the theory 
of minimal surfaces. 

In the course of this work, Schwarz had the need for a two-dimensional 
integral analog of Cauchy’s inequality. In particular, he needed to show 
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that if S C R? and f: S +R and g: S —R, then the double integrals 


A= ff Paviy, B= [ff toavdy, c= |f @Pavdy 
S S Ss 


must satisfy the inequality 
|B) < VA- VC, (1.20) 


and Schwarz also needed to know that the inequality is strict unless the 
functions f and g are proportional. 

An approach to this result via Cauchy’s inequality would have been 
problematical for several reasons, including the fact that the strictness 
of a discrete inequality can be lost in the limiting passage to integrals. 
Thus, Schwarz had to look for an alternative path, and, faced with 
necessity, he discovered a proof whose charm has stood the test of time. 

Schwarz based his proof on one striking observation. Specifically, he 
noted that the real polynomial 


p(t) = If. (+7, y) + g(a, ») : drdy = At? + 2Bt+C 


is always nonnegative, and, moreover, p(t) is strictly positive unless f 
and g are proportional. The binomial formula then tells us that the 
coefficients must satisfy B? < AC, and unless f and g are proportional, 
one actually has the strict inequality B? < AC. Thus, from a single 
algebraic insight, Schwarz found everything that he needed to know. 

Schwarz’s proof requires the wisdom to consider the polynomial p(t), 
but, granted that step, the proof is lightning quick. Moreover, as one 
finds from Exercise 1.11, Schwarz’s argument can be used almost without 
change to prove the inner product form (1.16) of Cauchy’s inequality, 
and even there Schwarz’s argument provides one with a quick under- 
standing of the case of equality. Thus, there is little reason to wonder 
why Schwarz’s argument has become a textbook favorite, even though 
it does require one to pull a rabbit — or at least a polynomial — out of 
a hat. 


THE NAMING OF THINGS — ESPECIALLY INEQUALITIES 


In light of the clear historical precedence of Bunyakovsky’s work over 
that of Schwarz, the common practice of referring to the bound (1.19) as 
Schwarz’s inequality may seem unjust. Nevertheless, by modern stan- 
dards, both Bunyakovsky and Schwarz might count themselves lucky to 
have their names so closely associated with such a fundamental tool of 
mathematical analysis. Except in unusual circumstances, one garners 
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little credit nowadays for crafting a continuous analog to a discrete in- 
equality, or vice versa. In fact, many modern problem solvers favor a 
method of investigation where one rocks back and forth between dis- 
crete and continuous analogs in search of the easiest approach to the 
phenomena of interest. 

Ultimately, one sees that inequalities get their names in a great variety 
of ways. Sometimes the name is purely descriptive, such as one finds with 
the triangle inequality which we will meet shortly. Perhaps more often, 
an inequality is associated with the name of a mathematician, but even 
then there is no hard-and-fast rule to govern that association. Sometimes 
the inequality is named after the first finder, but other principles may 
apply — such as the framer of the final form, or the provider of the best 
known application. 

If one were to insist on the consistent use of the rule of first finder, then 
Holder’s inequality would become Rogers’s inequality, Jensen’s inequal- 
ity would become Hoélder’s inequality, and only riotous confusion would 
result. The most practical rule — and the one used here — is simply to 
use the traditional names. Nevertheless, from time to time, it may be 
scientifically informative to examine the roots of those traditions. 


EXERCISES 


Exercise 1.1 (The 1-Trick and the Splitting Trick) 
Show that for each real sequence a1, a2,...,@,, one has 


ay +g +++ + an < Vnlaj +a} +--+ +02)? (a) 


and show that one also has 
n n ben - 
Soa s (Slew!) (Solent) (b) 
k=1 k=1 k=1 
The two tricks illustrated by this simple exercise will be our constant 
companions throughout the course. We will meet them in almost count- 
less variations, and sometimes their implications are remarkably subtle. 


Exercise 1.2 (Products of Averages and Averages of Products) 

Suppose that p; > 0 for all j = 1,2,...,nand py +po+---+pn =. 
Show that if a; and b; are nonnegative real numbers that satisfy the 
termwise bound 1 < a,b; for all 7 = 1,2,...,m, then one also has the 
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aggregate bound for the averages, 


1< {oma }{ Dati} (1.21) 


This graceful bound is often applied with b; = 1/a,. It also has a subtle 
complement which is developed much later in Exercise 5.8. 


Exercise 1.3 (Why Not Three or More?) 

Cauchy’s inequality provides an upper bound for a sum of pairwise 
products, and a natural sense of confidence is all one needs to guess 
that there are also upper bounds for the sums of products of three or 
more terms. In this exercise you are invited to justify two prototypical 
extensions. The first of these is definitely easy, and the second is not 
much harder, provided that you do not give it more respect than it 
deserves: 


Exercise 1.4 (Some Help From Symmetry) 

There are many situations where Cauchy’s inequality conspires with 
symmetry to provide results that are visually stunning. Here are two 
examples from a multitude of graceful possibilities. 

(a) Show that for all positive x,y, z one has 


1/2 1/2 1/2 
s= (20 "4 ( ute y (4 y Zeit 
r+ytzZz e+ytzZ E+ytzZ 


(b) Show that for all positive x,y, z one has 


2 y? 2 
at+ytz<2 | : 
yYy+tz2 "e4+2 x+y 


Exercise 1.5 (A Crystallographic Inequality with a Message) 
Recall that f (2) = cos(Gz) satisfies the identity f?(x) = (1+ f(2z)), 
and show that if p, > 0 forl <k<nand pj + po+--:-+ pp =1 then 


g(x) = bn cos(Gza) satisfies g?(ax) < *{1 + g(2zx)}. 
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This is known as the Harker—Kasper inequality, and it has far-reaching 
consequences in crystallography. For the theory of inequalities, there is 
an additional message of importance; given any functional zdentity one 
should at least consider the possibility of an analogous inequality for a 
more extensive class of related functions, such as the class of mixtures 
used here. 


Exercise 1.6 (A Sum of Inversion Preserving Summands) 
Suppose that p, > 0 for 1 <<kA<nand pi +po+---+ppn=1. Show 
that one has the bound 


n 2 

1 3 
S- (n+ ~) >n?+2n4+1/n, 
kel Pk 
and determine necessary and sufficient conditions for equality to hold 
here. We will see later (Exercise 13.6, p. 206), that there are analogous 
results for powers other than 2. 


Exercise 1.7 (Flexibility of Form) 
Prove that for all real x, y, a and (7 one has 


(5ax + ay + Bx + 3Gy)? 
< (5a? + 2a3 + 387) (5a? + 2ry + 3y”). (1.22) 


More precisely, show that the bound (1.22) is an immediate corollary 
of the Cauchy—Schwarz inequality (1.16) provided that one designs a 
special inner product (-,-) for the job. 


Exercise 1.8 (Doing the Sums) 

The effective use of Cauchy’s inequality often depends on knowing 
a convenient estimate for one of the bounding sums. Verify the four 
following classic bounds for real sequences: 


Yast < i ( Sat) for0<a2 <1, (a) 
} Be < (iowa) (S78) and (c) 


Starting with Cauchy 15 


E(usQ\Eay a 


Exercise 1.9 (Beating the Obvious Bounds) 

Many problems of mathematical analysis depend on the discovery of 
bounds which are stronger than those one finds with the direct appli- 
cation of Cauchy’s inequality. To illustrate the kind of opportunity one 
might miss, show that for any real numbers a;, 7 = 1,2...,n, one has 
the bound 


n 


da 


j=1 


2 
+ 


n 


S-1%aj) 


j=1 


<(n+2) Ye 


Here the direct application of Cauchy’s aia gives a bound with 
2n instead of the value n+ 2, so for large n one does better by a factor 
of nearly two. 


Exercise 1.10 (Schur’s Lemma — The R and C' Bound) 

Show that for each rectangular array {c;, :1 <j <m,l1<k <n}, 
and each pair of sequences {x; : 1 <j < m} and {y,:1<k <n}, we 
have the bound 


ene 


g=1 k=1 


1/2 


ra VRE( Sia") (Se imi" (1.23) 


where R and C' are the row sum and column sum maxima defined by 


R= ms lejn| and C= max). \cjr|- 

j=) 
This bound is known as Schur’s Lemma, but, ironically, it may be the 
second most famous result with that name. Nevertheless, this inequality 
is surely the single most commonly used tool for bounding a quadratic 
form. One should note in the extreme case when n = m, cjp = 0 j Fk, 
and c;; = 1 for 1 < 7 < n, Schur’s Lemma simply recovers Cauchy’s 
inequality. 


Exercise 1.11 (Schwarz’s Argument in an Inner Product Space) 
Let v and w be elements of the inner product space (V,(-,-)) and 
consider the quadratic polynomial defined for t € R by 


p(t) = (v+tw,v+tw). 
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Observe that this polynomial is nonnegative and use what you know 
about the solution of the quadratic equation to prove the inner product 
version (1.16) of Cauchy’s inequality. Also, examine the steps of your 
proof to establish the conditions under which the case of equality can 
apply. Thus, confirm that Schwarz’s argument (page 11) applies almost 
without change to prove Cauchy’s inequality for a general inner product. 


Exercise 1.12 (Example of a Self-generalization) 

Let (-,-) denote an inner product on the vector space V and suppose 
that x1,X2,..-,Xn and y1,y2,-.-.-,¥n are sequences of elements of V. 
Prove that one has the following vector analog of Cauchy’s inequality: 


1 
n z n 


Dts) 5 (26655)) 


j=l 


i 
2 


(Swnw) 29 


j=l 
Note that if one takes n = 1, then this bound simply recaptures the 
Cauchy—Schwarz inequality for an inner product space, while, if one 
keeps n general but specializes the vector space V to be R with the trivial 
inner product (x,y) = ry, then the bound (1.24) simply recaptures the 
plain vanilla Cauchy inequality. 


Exercise 1.13 (Application of Cauchy’s Inequality to an Array) 


Show that if {a;, :1<j <m,1<k <n} is an array of real numbers 
then one has 


(Len) (Saw) s (Sd 1) mm S23 of, 


j=l k=1 j=l k=1 


Moreover, show that equality holds here if and only if there exist a; and 
@, such that aj, = a; + B, for alll <j <mandl<k<n. 


Exercise 1.14 (A Cauchy Triple and Loomis—Whitney) 

Here is a generalization of Cauchy’s inequality that has as a corollary 
a discrete version of the Loomis-Whitney inequality, a result which in 
the continuous case provides a bound on the volume of a set in terms 
of the volumes of the projections of that set onto lower dimensional 
subspaces. The discrete Loomis—Whitney inequality (1.26) was only 
recently developed, and it has applications in information theory and 
the theory of algorithms. 

(a) Show that for any nonnegative a;;,bjx, Chi with 1 < i,j,k <n one 
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Here we have a set A 

with cardinality |A| = 27 
with projections that satisfy 
|Ax| = |Ay| = |Az| = 9. 


Fig. 1.1. The discrete Loomis—Whitney inequality says that for any collection 
al 1 al 

A of points in R® one has |A| < |Az|?|A,|2|A-|2. The cubic arrangement 

indicated here suggests the canonical situation where one finds the case of 

equality in the bound. 


has the triple product inequality 


n n 1 n i n 2 
S dehd< lO auh {tel {ou 0.29 
kji=1 


ijk=1 ij=l j,k=1 


(b) Let A denote a finite set of points in Z? and let A;, Ay, Az denote 
the projections of A onto the corresponding coordinate planes that are 
orthogonal to the x, y, or z-axes. Let |B| denote the cardinality of a set 
B C Z? and show that the projections provide an upper bound on the 
cardinality of A: 


|A| < |Ao]?|Ayl?|A2?- (1.26) 


Exercise 1.15 (An Application to Statistical Theory) 
If p(k; 6) > 0 for all k € D and @ € © and if 


S> p(k; 0) =1 for all 0 € ©, (1.27) 
keD 
then for each 6 € © one can think of Mg = {p(k;6) : k € D} as 
specifying a probability model where p(k; 0) represents the probability 
that we “observe k” when the parameter 0 is the true “state of nature.” 
If the function g : D — R satisfies 


S¢ g(k)p(k}0) =0 — for all 0 €®, (1.28) 
keD 


then g is called an unbiased estimator of the parameter 6. Assuming 
that D is finite and p(k; 6) is a differentiable function of 6, show that 
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one has the lower bound 
Y= (glk) — 9)? p(k; 8) > 1/108) (1.29) 
kED 

where I : © — R is defined by the sum 


2 

1(@)= >) {pot 0) / p(k; } p(k; 4), (1.30) 

kED 

where po(k;@) = Op(k;6)/00. The quantity defined by the left side of 
the bound (1.29) is called the variance of the unbiased estimator g, and 
the quantity [(@) is known as the Fisher information at 6 of the model 
Mg. The inequality (1.29) is known as the Cramér—Rao lower bound, 
and it has extensive applications in mathematical statistics. 


2 


Cauchy’s Second Inequality: 
The AM-GM Bound 


Our initial discussion of Cauchy’s inequality pivoted on the application 
of the elementary real variable inequality 


2 2 
cy < > + a for all z,y ER, (2.1) 


and one may rightly wonder how so much value can be drawn from a 
bound which comes from the trivial observation that (x — y)? > 0. Is it 
possible that the humble bound (2.1) has a deeper physical or geometric 
interpretation that might reveal the reason for its effectiveness? 

For nonnegative 7 and y, the direct term-by-term interpretation of 
the inequality (2.1) simply says that the area of the rectangle with sides 
x and y is never greater than the average of the areas of the two squares 
with sides x and y, and although this interpretation is modestly interest- 
ing, one can do much better with just a small change. If we first replace 
x and y by their square roots, then the bound (2.1) gives us 


A /ay < 2a + 2y for all nonnegative x F y, (2.2) 


and this inequality has a much richer interpretation. 

Specifically, suppose we consider the set of all rectangles with area A 
and side lengths x and y. Since A = zy, the inequality (2.2) tells us that 
a square with sides of length s = ,/ry must have the smallest perimeter 
among all rectangles with area A. Equivalently, the inequality tells us 
that among all rectangles with perimeter p, the square with side s = p/4 
alone attains the maximal area. 

Thus, the inequality (2.2) is nothing less than a rectangular version of 
the famous isoperimetric property of the circle, which says that among 
all planar regions with perimeter p, the circle of circumference p has the 
largest area. We now see more clearly why xy < x7/2+ y?/2 might be 
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powerful; it is part of that great stream of results that links symmetry 
and optimality. 


FROM SQUARES TO n-CUBES 


One advantage that comes from the isoperimetric interpretation of 
the bound \/zy < («+ y)/2 is the boost that it provides to our intu- 
ition. Human beings are almost hardwired with a feeling for geometri- 
cal truths, and one can easily conjecture many plausible analogs of the 
bound \/ry < (x + y)/2 in two, three, or more dimensions. 

Perhaps the most natural of these analogs is the assertion that the 
cube in R® has the largest volume among all boxes (i.e., rectangular 
parallelepipeds) that have a given surface area. This intuitive result is 
developed in Exercise 2.9, but our immediate goal is a somewhat different 
generalization — one with a multitude of applications. 

A box in R” has 2” corners, and each of those corners is incident to 
n edges of the box. If we let the lengths of those edges be aj, d2,..., Qn, 
then the same isoperimetric intuition that we have used for squares and 
cubes suggests that the n-cube with edge length $/n will have the largest 
volume among all boxes for which a, + ag +--:+ a, = S. The next 
challenge problem offers an invitation to find an honest proof of this 
intuitive claim. It also recasts this geometric conjecture in the more 
common analytic language of arithmetic and geometric means. 


Problem 2.1 (Arithmetic Mean-Geometric Mean Inequality) 
Show that for every sequence of nonnegative real numbers ay, 2,.--, Gn 


one has 
ee < Qa, +a@g+-::+a4yn 


(2.3) 


(aa2 An - 


FROM CONJECTURE TO CONFIRMATION 

For n = 2, the inequality (2.3) follows directly from the elementary 
bound ,/zy < (a + y)/2 that we have just discussed. One then needs 
just a small amount of luck to notice (as Cauchy did long ago) that the 
same bound can be applied twice to obtain 


(a1a2) 2 + (a3a4)? wi + az + a3 + a4 
2 = 4 

This inequality confirms the conjecture (2.3) when n = 4, and the new 

bound (2.4) can be used again with /zy < (a + y)/2 to find that 


(a,aza3a4)* < (2.4) 


(aya2a3a4)? + (asaga7ag) 4 e a, +ag+--:+ag 
2 = 8 ’ 


(Giaetndgle < 
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which confirms the conjecture (2.3) for n = 8. 
Clearly, we are on a roll. Without missing a beat, one can repeat this 
argument k times (or use induction) to deduce that 


(ayaq°- steele < (a1 + a2 + +++ + age) /2" forallk >1. (2.5) 


The bottom line is that we have proved the target inequality for all 
n = 2", and all one needs now is just some way to fill the gaps between 
the powers of two. 

The natural plan is to take an n < 2" and to look for some way to use 
the n numbers aj, d@2,...,@, to define a longer sequence a1, Q2,..., Agx 
to which we can apply the inequality (2.5). The discovery of an effective 
choice for the values of the sequence {a;} may call for some exploration, 
but one is not likely to need too long to hit on the idea of setting a; = a; 
for 1 <i <n and setting 


a= BERT a forn<i<2*: 


in other words, we simply pad the original sequence {a; : 1 <i < n} with 
enough copies of the average A to give us a sequence {a; : 1 <i < 2*} 
that has length equal to 2*. 

The average A is listed 2* — n times in the padded sequence {a;}, so, 
when we apply inequality (2.5) to {a;}, we find 


1/2" k k 
k_» a) +ag+-+:-+a,4+(2"-—n)A  2°A 
{ian y-d” \ < e a 5F ( ) eT: =A. 


Now, if we clear the powers of A to the right-hand side, then we find 
(a,a2 eee Gy,)!/?" < Ane 


and, if we then raise both sides to the power 2*/n, we come precisely to 
our target inequality, 
a1 + a2 +++++4n 


(a1a2--+an)i/” < 7 


(2.6) 


A SELF-GENERALIZING STATEMENT 


The AM-GM inequality (2.6) has an instructive self-generalizing qual- 
ity. Almost without help, it pulls itself up by the bootstraps to a new 
result which covers cases that were left untouched by the original. Under 
normal circumstances, this generalization might seem to be too easy to 
qualify as a challenge problem, but the final result is so important the 
problem easily clears the hurdle. 
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Problem 2.2 (The AM-GM Inequality with Rational Weights) 


Suppose that p,,po,.--,Pn are nonnegative rational numbers that sum 
to one, and show that for any nonnegative real numbers ay, d2,...,Qn 
one has 

apt ae? ---aP” < pyay + podg +--+ + Pray. (2.7) 


Once one asks what role the rationality of the p; might play, the 
solution presents itself soon enough. If we take an integer M so that for 
each j we can write p; = k;/M for an integer k;, then one finds that the 
ostensibly more general version (2.7) of the AM-GM follows from the 
original version (2.3) of the AM-GM applied to a sequence of length M 
with lots of repetition. One just takes the sequence with k; copies of a; 
for each 1 < 7 < n and then applies the plain vanilla AM-GM inequality 
(2.3); there is nothing more to it, or, at least there is nothing more if we 
attend strictly to the stated problem. 

Nevertheless, there is a further observation one can make. Once the 
result (2.7) is established for rational values, the same inequality follows 
for general values of p; “just by taking limits.” In detail, we first choose 
a sequence of numbers p,(t), 7 = 1,2,...,n and t = 1,2,... for which 
we have 

n 
pj(t)>0, Dip; (t)=1, and lim p,(t) = pj. 
j=l 
One then applies the bound (2.7) to the n-tuples (pi (t), po(t),...,pn(t)), 
and, finally, one lets n go to infinity to get the general result. 

The technique of proving an inequality first for rationals and then 
extending to reals is often useful, but it does have some drawbacks. For 
example, the strictness of an inequality may be lost as one passes to a 
limit so the technique may leave us without a clear understanding of 
the case of equality. Sometimes this loss is unimportant, but for a tool 
as fundamental as the general AM-GM inequality, the conditions for 
equality are important. One would prefer a proof that handles all the 
features of the inequality in a unified way, and there are several pleasing 
alternatives to the method of rational approximation. 


POLYA’S DREAM AND A PATH OF REDISCOVERY 


The AM-GM inequality turns out to have a remarkable number of 
proofs, and even though Cauchy’s proof via the imaginative leap-forward 
fall-back induction is a priceless part of the world’s mathematical in- 
heritance, some of the alternative proofs are just as well loved. One 
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e = 2.71828 --- y=l+a 


Fig. 2.1. The line y = 1+ a is tangent to the curve y = e” at the point « = 0, 
and the line is below the curve for all « € R. Thus, we have 14+ x < e” for 
all « € R, and, moreover, the inequality is strict except when x = 0. Here 
one should note that the y-axis has been scaled so that e is the unit; thus, the 
divergence of the two functions is more rapid than the figure may suggest. 


particularly charming proof is due to George Pélya who reported that 
the proof came to him in a dream. In fact, when asked about his proof 
years later Pélya replied that it was the best mathematics he had ever 
dreamt. 

Like Cauchy, Polya begins his proof with a simple observation about a 
nonnegative function, except Pélya calls on the function x + e* rather 
than the function x +> 2”. The graph of y = e” in Figure 2.1 illustrates 
the property of y = e” that is the key to Péolya’s proof; specifically, it 
shows that the tangent line y = 1+ x runs below the curve y = e”, so 
one has the bound 


1+a2<e* for all x € R. (2.8) 


Naturally, there are analytic proofs of this inequality; for example, Ex- 
ercise 2.2 suggests a proof by induction, but the evidence of Figure 2.1 
is all one needs to move to the next challenge. 


Problem 2.3 (The General AM-GM Inequality) 
Take the hint of exploiting the exponential bound, and discover Polya’s 
proof for yourself; that is, show that the inequality (2.8) implies that 
at ae? +++ al" < pray + pada +++ + Padn (2.9) 


for nonnegative real numbers a1, a2,...,@, and each sequence p1,p2,---;Pn 
of positive real numbers which sums to one. 
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In the AM-GM inequality (2.9) the left-hand side contains a product 
of terms, and the analytic inequality 1+ x < e” stands ready to bound 
such a product by the exponential of a sum. Moreover, there are two 
ways to exploit this possibility; we could write the multiplicands a, in 
the form 1+, and then apply the analytic inequality (2.8), or we could 
modify the inequality (2.8) so that its applies directly to the az. In 
practice, one would surely explore both ideas, but for the moment, we 
will focus on the second plan. 

If one makes the change of variables 7 ++ x — 1, then the exponential 
bound (2.8) becomes 


Paes for allx € R, (2.10) 
and if we apply this bound to the multiplicands a,, k = 1,2,..., we find 
ape ee) atid. ah < ePeteo Pk, 


When we take the product we find that the geometric mean aj? a4? --- aPn 
is bounded above by 


R(a1,42,...,@n) = exp ({ omn} -1), (2.11) 


We may be pleased to know that the geometric mean G = aj" ah? ---aPr 
is bounded by R, but we really cannot be too thrilled until we understand 
how R compares with the arithmetic mean 


A= pidi + pod2 +--+ + Pndn, 
and this is where the problem gets interesting. 


A MopgEst PARADOX 


When we ask ourselves about a possible relation between A and R, 
A~1 one sees that R 
is also an upper bound on the arithmetic mean A, so, all in one package, 
we have the double bound 


one answer comes quickly. From the bound A < e 


Drab? +h", piai + peda +++: + Pron} 


< exp ({ Doma} -1). (2.12) 


This inequality inequality now presents us with a task which is at least 
a bit paradoxical. Can it really be possible to establish an inequality 
between two quantities when all one has is an upper bound on their 


max{a_ 


maximum? 
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MEETING THE CHALLENGE 


While we might be discouraged for a moment, we should not give 
up too quickly. We should at least think long enough to notice that the 
bound (2.12) does provide a relationship between A and G in the special 
case when one of the two maximands on the left-hand side is equal to the 
term on the right-hand side. Perhaps we can exploit this observation. 

Once this is said, the familiar notion of normalization is likely to come 
to mind. Thus, if we consider the new variables a,, k = 1,2,...,n, 
defined by the ratios 

ak 


Ok = | where A = pia, + poa2+--++ Pnan, 


and if we apply the bound (2.11) to these new variables, then we find 


3)" (GG) <n (Ea-) 


After we clear the multiples of A to the right side and recall that one 
has py + po +--:+ pn = 1, we see that the proof of the general AM-GM 
inequality (2.9) is complete. 

A First LooK Back 


When we look back on this proof of the AM-GM inequality (2.9), 
one of the virtues that we find is that it offers us a convenient way to 
identify the circumstances under which we can have equality; namely, if 
we examine the first step we see that we have 


- < e(%/A-1 unless a =1, (2213) 
and we always have 


Ah < een, 


so we see that one also has 


PA p2 Pn n 
at ag an Qk 2. 
(2) (@) ~(G) <ea (nay -a)= ee 
unless a, = A for all k = 1,2,...,n. In other words, we find that one 
has equality in the AM-GM inequality (2.9) if and only if 


Qa, = ag =—::'=A4n- 


Looking back, we also see that the two lines (2.13) and (2.14) actually 
contain a full proof of the general AM-GM inequality. One could even 


26 The AM-GM Inequality 


argue with good reason that the single line (2.13) is all the proof that 
one really needs. 


A LONGER LOOK BACK 


This identification of the case of equality in the AM-GM bound may 
appear to be only an act of convenient tidiness, but there is much more 
to it. There is real power to be gained from understanding when an 
inequality is most effective, and we have already seen two examples of 
the energy that may be released by exploiting the case of equality. 


When one compares the way that the AM-GM inequality was ex- 
tracted from the bound 1+ 2 < e* with the way that Cauchy’s inequality 
was extracted from the bound xy < «7/2 + y?/2, one may be struck by 
the effective role played by normalization — even though the normaliza- 
tions were of quite different kinds. Is there some larger principle afoot 
here, or is this just a minor coincidence? 


There is more than one answer to this question, but an observation 
that seems pertinent is that normalization often helps us focus the appli- 
cation of an inequality on the point (or the region) where the inequality 
is most effective. For example, in the derivation of the AM-GM inequal- 
ity from the bound 1+ a < e®, the normalizations let us focus in the 
final step on the point x = 0, and this is precisely where 1+ a < e* 
is sharp. Similarly, in the last step of the proof of Cauchy’s inequality 
for inner products, normalization essentially brought us to the case of 
x = y = 1 in the two variable bound xy < 27/2 + y?/2, and again this 
is precisely where the bound is sharp. 


These are not isolated examples. In fact, they are pointers to one of 
the most prevalent themes in the theory of inequalities. Whenever we 
hope to apply some underlying inequality to a new problem, the success 
or failure of the application will often depend on our ability to recast 
the problem so that the inequality is applied in one of those pleasing 
circumstances where the inequality is sharp, or nearly sharp. 


In the cases we have seen so far, normalization helped us reframe 
our problems so that an underlying inequality could be applied more 
efficiently, but sometimes one must go to greater lengths. The next 
challenge problem recalls what may be one of the finest illustrations of 
this fight in all of the mathematical literature; it has inspired generations 
of mathematicians. 
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POLYA’s COACHING AND CARLEMAN’S INEQUALITY 


In 1923, as the first step in a larger project, Torsten Carleman proved a 
remarkable inequality which over time has come to serve as a benchmark 
for many new ideas and methods. In 1926 George Polya gave an elegant 
proof of Carleman’s inequality that depended on little more than the 
AM-GM inequality. 

The secret behind Pélya’s proof was his reliance on the general prin- 
ciple that one should try to use an inequality where it is most effective. 
The next challenge problem invites you to explore Carleman’s inequality 
and to see if with a few hints you might also discover Pélya’s proof. 


Problem 2.4 (Carleman’s Inequality) 
Show that for each sequence of positive real numbers a1, a2,... one has 
the inequality 


co [oe 
S“(arag +++ ay)'/* < ed - ar, (2.15) 
k=1 k=1 
where e denotes the natural base 2.71828 .... 

Our experience with the series version of Cauchy’s inequality suggests 
that a useful way to approach a quantitative result such as the bound 
(2.15) is to first consider a simpler qualitative problem such as showing 


So ax <o => S"(ara2 -+ sap) ® < 00. (2.16) 
k=1 k=1 


Here, in the natural course of events, one would apply the AM-GM 
inequality to the summands on the right, do honest calculations, and 
hope for good luck. This plan leads one to the bound 

n n k n n 


~ 1 ~ ~ eel 
do (ara2 ; ay) i/* < e ay = a; > Re 


k=1 k=1 "j=l j=l kj 


and — with no great surprise — we find that the plan does not work. As 
nm — oo our upper bound diverges, and we find that the naive application 
of the AM-GM inequality has left us empty-handed. 

Naturally, this failure was to be expected since this challenge problem 
is intended to illustrate the principle of maximal effectiveness whereby 
we conspire to use our tools under precisely those circumstances when 
they are at their best. Thus, to meet the real issue, we need to ask 
ourselves why the AM-GM bound failed us and what we might do to 
overcome that failure. 
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PURSUIT OF A PRINCIPLE 


By the hypothesis on the left-hand side of the implication (2.16), the 
sum a; +a2+--- converges, and this modest fact may suggest the likely 
source of our difficulties. Convergence implies that in any long block 
@1,Q2,---,@n there must be terms that are “highly unequal,” and we 
know that in such a case the AM-GM inequality can be highly inefficient. 
Can we find some way to make our application of the AM-GM bound 
more forceful? More precisely, can we direct our application of the AM- 
GM bound toward some sequence with terms that are more nearly equal? 

Since we know very little about the individual terms, we do not know 
precisely what to do, but one may well not need long to think of mul- 
tiplying each a, by some fudge factor c, which we can try to specify 
more completely once we have a clear understanding of what is really 
needed. Naturally, the vague aim here is to find values of cz so that the 
sequence of products c a1, c2a2,... will have terms that are more nearly 
equal than the terms of our original sequence. Nevertheless, heuristic 
considerations carry us only so far. Ultimately, honest calculation is our 
only reliable guide. 

Here we have the pleasant possibility of simply repeating our earlier 
calculation while keeping our fingers crossed that the new fudge factors 
will provide us with useful flexibility. Thus, if we just follow our nose 
and calculate as before, we find 


[oe) co 

yuk (a1C149C +++ agen) t/* 
di laraa-+- ax)" = ST ve 
k=l ta1_~—«(ee2 + Ce) 


oy Mises eee 
~ k(e1¢2 +++ cp) */* 


k=1 
= y OKCk > — (2.17) 


and here we should take a breath. From this formula we see that the 
proof of the qualitative conjecture (2.16) will be complete if we can find 
some choice of the factors c,, k = 1,2,... such that the sums 


Sk = Ck a 


CERT k=1,2,.... (2.18) 


form a bounded sequence. 
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NECESSITY, POSSIBILITY, AND COMFORT 


The hunt for a suitable choice of the cx, can take various directions, 
but, wherever our compass points, we eventually need to estimate the 
sum s;. We should probably try to make this task as easy as possible, 
and here we are perhaps lucky that there are only a few series with tail 
sums that we can calculate. In fact, almost all of these come from the 


telescoping identity 
SS { i 1 } 1 
by bj44 br 


jak 


that holds for all real monotone sequences {b; : 1,2,...} with b; — oo. 
Among the possibilities offered by this identity, the simplest choice is 
surely given by 


erie aera (2.19) 


and, when we compare the sums (2.18) and (2.19), we see that 5, may 
be put into the simplest form when we define the fudge factors by the 
implicit recursion 


(cieg---¢;)/¥=j+1 forj=1,2,.... (2.20) 


This choice gives us a short formula for sx, 


Se Se cern en so ees an =<, (2.21) 


and all we need now is to estimate the size of cy. 


j(cice: 


THE END OF THE TRAIL 
Fortunately, this estimation is not difficult. From the implicit recur- 
sion (2.20) for c; applied twice we find that 
ecg jy =f2 + and cyeg-- +e; = (Jf +1), 
so division gives us the explicit formula 
p41) Ly 
oy = LF = (145) : 
y? J 
From this formula and our original bound (2.17) we find 


Co 


oe) 1 k 
S“(arag +++ ay)'/* < S- (1+ ;) Ak, (2.22) 
k=1 


k=1 
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and this bound puts Carleman’s inequality (2.15) in our grasp. In fact, 
the bound (2.22) is even a bit stronger than Carleman’s inequality since 
setting « = 1/k in the familiar analytic bound 1 + x < e* implies that 


(1+1/k)"<e forall k=1,2,.... 


EFFICIENCY AND THE CASE OF EQUALITY 


There is more than a common dose of accidental elegance in Pélya’s 
proof of Carleman’s inequality, and some care must be taken not to 
lose track of the central idea. The insight to be savored is that there 
are circumstances where one may greatly improve the effectiveness of 
an inequality simply by restructuring the problem so that the inequal- 
ity is applied in a situation that is close to the critical case of equality. 
Polya’s proof of Carleman’s inequality illustrates this idea with excep- 
tional charm, but there are many straightforward situations where its 
effect is just as great. 


Wuo WAS GEORGE POLYA? 


George Pélya (1887-1985) was one of the most influential mathemati- 
cians of the 20th century, but his most enduring legacy may be the 
insight he passed on to us about teaching and learning. Pdélya saw the 
process of problem solving as a fundamental human activity — one filled 
with excitement, creativity, and the love of life. He also thought hard 
about how one might become a more effective solver of mathematical 
problems and how one might coach others to do so. 

Polya summarized his thoughts in several books, the most famous 
of which is How to Solve It. The central premise of Pélya’s text is 
that one can often make progress on a mathematical problem by asking 
certain general common sense questions. Many of Polya’s questions may 
seem obvious to a natural problem solver — or to anyone else — but, 
nevertheless, the test of time suggests that they possess considerable 
wisdom. 

Some of the richest of Pélya’s suggestions may be repackaged as the 
modestly paradoxical question: “What is the simplest problem that you 
cannot solve?” Here, of course, the question presupposes that one al- 
ready has some particular problem in mind, so this suggestion is perhaps 
best understood as shorthand for a longer list of questions which would 
include at least the following: 


e “Can you solve your problem in a special case?” 
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e “Can you relate your problem to a similar one where the answer is 
already known?” and 

e “Can you compute anything at all that is related to what you would 
really like to compute?” 


Every reader is encouraged to experiment with Polya’s questions while 
addressing the exercises. Perhaps no other discipline can contribute 
more to one’s effectiveness as a solver of mathematical problems. 


EXERCISES 


Exercise 2.1 (More from Leap-forward Fall-back Induction) 
Cauchy’s leap-forward, fall-back induction can be used to prove more 
than just the AM-GM inequality; in particular, it can be used to show 
that Cauchy’s inequality for n = 2 implies the general result. For exam- 
ple, by Cauchy’s inequality for n = 2 applied twice, one has 
a by + agb2 + a3b3 + aaba 
_ {a b; + ab} se {a3b3 + aba} 
S (aj + 03)? (by + 63)# + (a3 + a3)? (03 + 04)? 


< (a? + a3 + a3 + a2) 3 (b2 + 62 + b2 + 62)2, 


which is Cauchy’s inequality for n = 4. Extend this argument to obtain 
Cauchy’s inequality for all n = 2" and consequently for all n. This may 
be the method by which Cauchy discovered his famous inequality, even 
though in his textbook he chose to present a different proof. 


Exercise 2.2 (Bernoulli and the Exponential Bound) 
Pélya’s proof of the AM-GM inequality used the analytic bound 
l+a<e” for all x € R, (2.23) 
which is closely related to an inequality of Jacob Bernoulli (1654-1705), 
l+na <(1+2)” for all « € [—1,00) and alln =1,2,.... (2.24) 


Prove Bernoulli’s inequality (2.24) by induction and show how it may 
be used to prove that 1+ a < e® for all x € R. Finally, by calculus or 
by other means, prove one of the more general versions of Bernoulli’s 
inequality suggested by Figure 2.2; for example, prove that 


1+pe <(1+2)? for all x > —1 and all p> 1. (2.25) 
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y=(14+2)?, p>1 


y=(14+a)?, O0<p<l 


Fig. 2.2. The graph of y = (1+ 2)? suggests a variety of relationships, each 
of which depends on the range of x and the size of p. Perhaps the most useful 
of these is Bernoulli’s inequality (2.25) where one has p > 1 and x € [—1, 00). 


Exercise 2.3 (Bounds by Pure Powers) 

In the day-to-day work of mathematical analysis, one often uses the 
AM-GM inequality to bound a product or a sum of products by a simpler 
sum of pure powers. Show that for positive x,y,a, and @ one has 


a B 
erty? < gerd 4 an 2.26 

Yee ae B at p y (2:26) 
and, for a typical corollary, show that one also has the more timely 
bound 72004y 4 gy2004 < 72005 4 2005, 


Exercise 2.4 (A Canadian Challenge) 
Participants in the 2002 Canadian Math Olympiad were asked to 
prove the bound 


a? 3 
T 


<< 
~ be ae ab 


a+b+e 
and to determine when equality can hold. Can you meet the challenge? 


Exercise 2.5 (A Bound Between Differences) 


Show that for nonnegative x and y and integer n one has 


n(x — y) (ay) @-D/2 <a" —y”. (2.27) 
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The Geometry of the 
Geometric Mean 


a b 
A B Cc 


Fig. 2.3. The AM-GM inequality as Euclid could have imagined it. The circle 
has radius (a+ b)/2 and the triangle’s height h cannot be larger. Therefore if 


one proves that h = Vab one has a geometric proof of the AM-GM for n = 2. 


Exercise 2.6 (Geometry of the Geometric Mean) 

There is indeed some geometry behind the definition of the geomet- 
ric mean. The key relations were known to Euclid, although there is 
no evidence that Euclid specifically considered any inequalities. By ap- 
pealing to the geometry of Figure 2.3 prove that h = Vab and thereby 
automatically deduce that Vab < (a+ b)/2. 


Exercise 2.7 (One Bounded Product Implies Another) 
Show that for nonnegative x, y, and z one has the implication 


l<aye = 8<(14+2)(1+y)(1+2). (2.28) 
Can you also propose a generalization? 
Exercise 2.8 (Optimality Principles for Products and Sums) 


Given positive {a, : 1 <k <n} and positive c and d, we consider the 
maximization problem P,, 


max{2@2°++ Lp 2 A,X, + AgXQ +--+ + An %y = ch, 
and the minimization problem Pp», 
min{a,21 + Gg%q ++++ + On 0p 2 £1XQ+++ Ly = ah. 


Show that for both of these problems the condition for optimality is given 
by the relation 


a1%1 = AQL2 = +++ = AnXn. (2.29) 


These optimization principles are extremely productive, and they can 
provide useful guidance even when they do not exactly apply. 
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Exercise 2.9 (An Isoperimetric Inequality for the 3-Cube) 


Show that among all boxes with a given surface area, the cube has 
the largest volume. Since a box with edge lengths a,b, and c has surface 
area A = 2ab+ 2ac-+ 2bc and since a cube with surface area A has edge 
length (A/6)!/?, the analytical task is to show 


abc < (A/6)3/? 


and to confirm that equality holds if and only ifa=b=c. 


Exercise 2.10 (Akerberg’s Refinement) 


Show that for any nonnegative real numbers a1, d2,...,@, and n > 2 
one has the bound 


n-1 n 
afeteehnttenry! <fertertn toe)" gay 


In a way, this relation is a refinement of the AM-GM inequality since the 
AM-GM inequality follow immediately by iteration of the bound (2.30). 
To prove the recurrence (2.30), one might first show that 

n-1 


y(n—y"")=ny-y"<n-1  forally>0. 


The key is then to make a wise choice of y. 


Exercise 2.11 (Superadditivity of the Geometric Mean) 


Show that for nonnegative a, and by, 1 <k <n, one has 


he)» ("= (ee) 


This inequality of H. Minkowski asserts that the geometric mean is a su- 
peradditive function of its vector of arguments. Show that this inequality 
follows from the AM-GM inequality and determine the circumstances 
under which one can have equality. 

For a generic hint, consider the possibility of dividing both sides by 
the quantity on the right side. Surprisingly often one finds that an 
inequality may become more evident if it is placed in a “standard form” 
which asserts that a given algebraic quantity is bounded by one. 
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| 0.90 | 0.608:++ | 1.531. | 


0.90 [ost [1.8] 


0 0.5 1.0 1.5 2.0 


Fig. 2.4. The curve y = 2/e*' helps us measure the extent to which the 
individual terms of the averages must be squeezed together when the two 
sides of the AM-GM bound have a ratio that is close to one. For example, if 
we have y > 0.99, then we must have 0.694 < x < 1.149. 


Exercise 2.12 (On Approximate Equality in the AM-GM Bound) 


If the nonnegative real numbers a1, d9,...,@, are all approximately 
equal to a constant A, then it is easy to check that both the arithmetic 
mean A and the geometric mean G are approximately equal. There are 
several ways to frame a converse to this observation, and this exercise 
considers an elegant method first proposed by George Polya. 

Show that if one has the inequality 

A-G 
A 


O< areas (2.32) 


then one has the bound 
pos FG Spi for all k = 1,2,...,n, (2.33) 


where po € (0,1] and p; € [1, 00) are two the roots of the equation 


a (l—«)”. (2.34) 
As Figure 2.4 suggests, one key to this result is the observation that 


x—1 


the map x + «a/e is monotone increasing on [0,1] and monotone 


decreasing on [1, 00). 
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Any n points inside the 
complex right half plane 

are contained in a symmetric 
cone with central angle 2 
withO<w<r. 


Fig. 2.5. The complex analog of the AM-GM inequality provides a nontrivial 
bound on the product |z122-++2n|!/" provided that z;, 7 = 1,2,...,n are in 
the interior of the right half-plane. The quality of the bound depends on the 
central angle of the cone that contains the points. 


Exercise 2.13 (An AM-GM Inequality for Complex Numbers) 
Consider a set S' of n complex numbers 21, 22,...,2n for which the 
polar forms z; = p;e’®! satisfy the constraints 


O<pj<oo and 0<|6;|<v<7/2, l<j<n. 


As one sees in Figure 2.5, the spread in the arguments of the z; € S is 
bounded by 2. Show that for such numbers one has the bound 


1 
(cos) |z122-++ 2n|1/” < 7 |# + 2g +++++ Zn]. (2.35) 


Here one should note that if the z;, 7 = 1,2,...,n are all real numbers, 
then one can take ~ = 0, in which case the bound (2.35) recaptures the 
usual AM-GM inequality. 


Exercise 2.14 (A Leap-Forward Fall-Back Tour de Force) 
One can use Cauchy’s leap-forward fall-back method of induction to 
prove that for all nonnegative x1, %2,...,£ m and for all integer powers 
n=1,2,... one has the bound 
{ater tin} ee, 

7 < : 


— (2.36) 


This is a special case of the power mean inequality which we develop 
at length in Chapter 8, but here the focus is on mastery of technique. 
This exercise leads to one of the more sustained applications of Cauchy’s 
method that one is likely to meet. 
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Lagrange’s Identity 
and Minkowski’s Conjecture 


The inductive proof of Cauchy’s inequality used the polynomial identity 
(az + az)(b} + b3) = (ayby + azbz)* + (azbe 4 azb,)”, (3.1) 


but that proof made no attempt to exploit this formula to the fullest. 
In particular, we completely ignored the term (a1b2 — a2b1)? except for 
noting that it must be nonnegative. To be sure, any inequality must 
strike a compromise between precision and simplicity, but no one wants 
to be wasteful. Thus, we face a natural question: Can one extract any 
useful information from the castaway term? 

One can hardly doubt that the term (a,bz — azb,)? captures some 
information. At a minimum, it provides an explicit measure of the dif- 
ference between the squares of the two sides of Cauchy’s inequality, so 
perhaps it can provide a useful way to measure the defect that one incurs 
with each application of Cauchy’s inequality. 

The basic factorization (3.1) also tells us that for n = 2 one has 
equality in Cauchy’s inequality exactly when (a1b2 — a2b1)? = 0; so, 
assuming that (b1, b2) 4 (0,0), we see that we have equality if and only 
if (a1, @2) and (bi, b2) are proportional in the sense that 


a, = Ab, and ag = Abo for some real X. 


This observation has far-reaching consequences, and the first challenge 
problem invites one to prove an analogous characterization of the case 
of equality for the n-dimensional Cauchy inequality. 


Problem 3.1 (On Equality in Cauchy’s Bound) 

Show that if (b,,b2,...,6n) 4 0 then equality holds in Cauchy’s in- 
equality if and only if there is a constant X such that a; = 2b; for all 
i=1,2,...,n. Also, as before, if you already know a proof of this fact, 
you are invited to find a new one. 
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PASSAGE TO A MORE GENERAL IDENTITY 


Since the identity (3.1) provides a quick solution to Problem 3.1 when 
n = 2, one way to try to solve the problem in general is to look for a 
suitable extension of the identity (3.1) to n dimensions. Thus, if we in- 
troduce the quadratic polynomial Q, = Qn(a@1, a2,---, nj 01, b2,...,0n) 
that is given by the difference of the squares of the two sides of Cauchy’s 
inequality, then Q,, equals 


(a? + a2 +--+ +a2)(b} + 03 +--+. +62) — (a1bi + aabo +--+ + Gndn)?, 


and @, measures the “defect” in Cauchy’s inequality in n dimensions, 
just like Q2 = (a,b2 — b, a2)? measures the defect in two dimensions. We 
have already seen that Q2 can be written as the square of a polynomial, 
and now the challenge is to see if there is an analogous representation 
of Q,, as a square, or possibly as a sum of squares. 

If we simply expand Q,,, then we find that it can be written as 


n n n n 

i=1 j=l i=1 j=l 
As it sits, this formula may not immediately suggest any way to make 
further progress. We could use a nice hint, and even though there is no 
hint that always helps, there is a general principle that often provides 
useful guidance: pursue symmetry. 


SYMMETRY AS A HINT 


In practical terms, the suggestion to pursue symmetry just means that 
we should try to write our identity in a way that makes any symmetry 
as clear as possible. Here, the symmetry between 7 and j in the second 
double sum is forceful and clear, yet the symmetrical role of i and j in 
first double sum is not quite as evident. To be sure, symmetry is there, 
and we can make it stand out better if we rewrite Q,, in the form 


Qn = LS" 3 (0283 + a5b;) — 5S as: (3.3) 


i=1 j=1 i=1 j=1 

Now both double sums display transparent symmetry in i and j, and 
the new representation does suggest how to make progress; it almost 
screams for us to bring the two double sums together, and once this is 
done, one quickly finds the factorization 


Qn = oe {a20? — 2aybjajbs + aut = Eat; = ajbi). 


i=1 j=1 i=1 j=l 
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The whole story now fits into a single, informative, self-verifying line 
known as Lagrange’s Identity: 

n 2 n n 1 non 

i=1 i=1 i=l i=1 j=l 

Our path to this identity was motivated by our desire to understand 
the nonnegative polynomial Q,,, but, once the identity (3.4) is written 
down, it is easily verified just by multiplication. Thus, we meet one of 
the paradoxes of polynomial identities. 

One should note that Cauchy’s inequality is an immediate corollary of 
Lagrange’s identity, and, indeed, the proof that Cauchy chose to include 
in his 1821 textbook was based on just this observation. Here, we went 
in search of what became Lagrange’s identity (3.4) because we hoped it 
might lead to a clear understanding of the case of equality in Cauchy’s 
inequality. Along the way, we happened to find an independent proof of 
Cauchy’s inequality, but we still need to close the loop on our challenge 
problem. 


EQUALITY AND A GAUGE OF PROPORTIONALITY 

If (bi, b2,...,bn) 4 0, then there exist some b, 4 0, and if equality 
holds in Cauchy’s inequality, then all of the terms on the right-hand side 
of Lagrange’s identity (3.4) must be identically zero. If we consider just 
the terms that contain b;,, then we find 


ajby = apd; for alll <i<n, 
and, if we take \ = ax/bgz, then we also have 
a; = AD; forall <i<n. 


That is, Lagrange’s identity tells us that for nonzero sequences one can 
have equality in Cauchy’s inequality if and only if the two sequences are 
proportional. Thus we have a complete and precise answer to our first 
challenge problem. 

This analysis of the case of equality underscores that the symmetric 
form 


Qn = >> So (aids — abi)? 
j=1 j=l 


has two useful interpretations. We introduced it originally as a measure 
of the difference between the two sides of Cauchy’s inequality, but we 
see now that it is also a measure of the extent to which the two vectors 
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(a1, @2,..-,@n) and (bi, b2,...,0n) are proportional. Moreover, Q,, is 
such a natural measure of proportionality that one can well imagine a 
feasible course of history where the measure @,, appears on the scene 
before Cauchy’s inequality is conceived. This modest inversion of history 
has several benefits; in particular, it lead one to a notable inequality of 
E.A. Milne which is described in Exercise 3.8. 


ROOTS AND BRANCHES OF LAGRANGE’S IDENTITY 


Joseph Louis de Lagrange (1736-1813) developed the case n = 3 of 
the identity (3.4) in 1773 in the midst of an investigation of the geom- 
etry of pyramids. The study focused on questions in three-dimensional 
space, and Lagrange did not mention that the corresponding results for 
nm = 2 were well known, even to the mathematicians of antiquity. In 
particular, the two-dimensional version of the identity (3.4) was known 
to the Alexandrian Greek mathematician Diophantus, or, at least one 
can draw that inference from a problem that Diophantus included in his 
textbook Arithmetica, a volume whose provenance can only be traced 
to sometime between 50 A.D. and 300 A.D. 

Lagrange and his respected predecessor Pierre de Fermat (1601-1665) 
were quite familiar with the writings of Diophantus. In fact, much 
of what we know today of Fermat’s discoveries comes to us from the 
marginal comments that Fermat made in his copy of the Bachet trans- 
lation of Diophantus’s Arithmetica. In just such a note, Fermat asserted 
that for n > 3 the equation «” + y” = z” has no solution in positive 
integers, and he also wrote “I have discovered a truly remarkable proof 
which this margin is too small to contain.” 

As all the world knows now, this assertion eventually came to be 
known as Fermat’s Last Theorem, or, more aptly, Fermat’s conjecture; 
and for more than three centuries, the conjecture eluded the best efforts 
of history’s finest mathematicians. The world was shocked — and at 
least partly incredulous — when in 1993 Andrew Wiles announced that 
he had proved Fermat’s conjecture. Nevertheless, within a year or so the 
proof outlined by Wiles had been checked by the leading experts, and it 
was acknowledged that Wiles had done the deed that many considered 
to be beyond human possibility. 


PERSPECTIVE ON A GENERAL METHOD 


Our derivation of Lagrange’s identity began with a polynomial that 
we knew to be nonnegative, and we then relied on elementary algebra 
and good fortune to show that the polynomial could be written as a sum 
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of squares. The resulting identity did not need long to reveal its power. 
In particular, it quickly provided an independent proof of Cauchy’s in- 
equality and a transparent explanation for the necessary and sufficient 
conditions for equality. 

This experience even suggests an interesting way to search for new, 
useful, polynomial identities. We just take any polynomial that we know 
to be nonnegative, and we then look for a representation of that poly- 
nomial as a sum of squares. If our experience with Lagrange’s identity 
provides a reliable guide, the resulting polynomial identity should have 
a fair chance of being interesting and informative. 

There is only one problem with this plan — we do not know any 
systematic way to write a nonnegative polynomial as a sum of squares. 
In fact, we do not even know if such a representation is always possible, 
and this observation brings us to our second challenge problem. 


Problem 3.2 Can one always write a nonnegative polynomial as a sum 
of squares? That is, if the real polynomial P(x1,x2,...,%n) satisfies 


P(x1,%2,...,2%n) > 0 for all (a1, %2,...,2n) € R”, 


can one find a set of s real polynomials Qx(@1,%2,.--,%n), lS k<s, 
such, that 


P(x1,%2,..-,%n) = Qi +Q3+---+Q2? 


This problem turns out to be wonderfully rich. It leads to work that 
is deeper and more wide ranging than our earlier problems, and, even 
now, it continues to inspire new research. 


A DEFINITIVE ANSWER — IN A SPECIAL CASE 


As usual, one does well to look for motivation by examining some 
simple cases. Here the first case that is not completely trivial occurs 
when n = 1 and the polynomial P(x) is simply a quadratic ax? + br +c 
with a 4 0. Now, if we recall the method of completing the square that 
one uses to derive the binomial formula, we then see that P(x) can be 
written as 


b\?  dac— b? 
P(x) = ax? +b = : 3.5 
(a) = ax* + ba+c a(a x) + Te (3.5) 
and this representation very nearly answers our question. We only need 
to check that the last two summands may be written as the squares of 
real polynomials. 
If we consider large values of x, we see that P(x) > 0 implies that 
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a > 0, and if we take x) = —b/2a, then from the sum (3.5) we see that 
P(xo) > 0 implies 4ac — b? > 0. The bottom line is that both terms on 
the right-hand side of the identity (3.5) are nonnegative, so P(x) can be 
written as Q? + Q3 where Q; and Qz2 are real polynomials which we can 
write explicitly as 


b? — 4dac 
2/a 


This solves our problem for quadratic polynomials of one variable, and 


Q(z) =a! («+ 5) and Qo(x) = 


even though the solution is simple, it is not trivial. In particular, the 
identity (3.5) has some nice corollaries. For example, it shows that P(x) 
is minimized when x = —b/2a and that the minimum value of P(z) 
is equal to (4ac — b?)/4a — two useful facts that are more commonly 
obtained by calculus. 


EXPLOITING WHAT WE KNOW 


The simplest nontrivial case of Lagrange’s identity is 


(az + a5) (b} + 05) = (a1b1 + agb2)? + (aybe — agb1), 


and, since polynomials may be substituted for the reals in this formula, 
we find that it provides us with a powerful fact: the set of polynomials 
that can be written as the sum of squares of two polynomials is closed 
under multiplication. That is, if P(a) = Q(x) R(x) where Q(x) and R(x) 
have the representations 


Q(z) = Qi(x) + Q3(x) and R(x) = Ri(2) + R3(2), 


then P(a) also has a representation as a sum of two squares. More 
precisely, if we have 


P(x) = Q(2) R(x) = (Qi(x) + Q3(x)) (Ri(@) + R3(2)), 


then P(x) can also be written as 


{Qi (a) Ri (x) + Qo(a) Ro(x)}”+{Qi(2)Ro(x) — Qa(x)Ri(x)}". (3.6) 


This identity suggests that induction may be of help. We have already 
seen that a nonnegative polynomial of degree two can be written as a 
sum of squares, so an inductive proof has no trouble getting started. 
We should then be able to use the representation (3.6) to complete the 
induction, once we understand how nonnegative polynomials can be fac- 
tored. 
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FACTORIZATION OF NONNEGATIVE POLYNOMIALS 


Two cases now present themselves; either P(x) has a real root, or it 
does not. When P() has a real root r with multiplicity m, we can write 


P(x) = (a#-—r)”R(a) where R(r) 4 0, 


so, if we set x = r+e, then we have P(r +e) = €™R(r+e). Also, by the 
continuity of R, there is a 6 such that R(r +e) has the same sign for all 
€ with |e] < 6. Since P(x) is always nonnegative, we then see that ¢«” 
has the same sign for all |e] < 6, so m must be even. If we set m = 2k, 
we see that 


P(x) =Q2(a)R(x) where Q(a) = («—r)*, 


and, from this representation, we see that R(x) is also a nonnegative 
polynomial. Thus, we have found a useful factorization for the case 
when P(x) has a real root. 

Now, suppose that P(x) has no real roots. By the fundamental theo- 
rem of algebra, there is a complex root r, and since 


0=P(r) implies 0 = P(r) = P(*), 


we see that the complex conjugate 7 is also a root of P. Thus, P has 
the factorization 


P(x) = (e —r)(e — 7)R(2) = Q(2)R(@). 


The real polynomial Q(x) = (a — r)(x — 7) is positive for large x, and 
it has no real zeros, so it must be positive for all real x. By assump- 
tion, P(x) is nonnegative, so we see that R(x) is also nonnegative. Thus, 
again we find that any nonnegative polynomial P(x) with degree greater 
than two can be written as the product of two nonconstant, nonnega- 
tive polynomials. By induction, we therefore find that any nonnegative 
polynomial in one variable can be written as the sum of the squares of 
two real polynomials. 


ONE VARIABLE DOWN — ONLY N VARIABLES TO GO 


Our success with polynomials of one variable naturally encourages us 
to consider nonnegative polynomials in two or more variables. Unfortu- 
nately, the gap between the a one variable problem and a two variable 
problem sometimes turns out to be wider than the Grand Canyon. 

For polynomials in two variables, the zero sets {(x,y) : P(x,y) = 0} 
are no longer simple discrete sets of points. Now they can take on a 
bewildering variety of geometrical shapes that almost defy classification. 
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After some exploration, we may even come to believe that there might 
exist nonnegative polynomials of two variables that cannot be written 
as the sum of squares of real polynomials. This is precisely what the 
great mathematician Hermann Minkowski first suggested, and, if we are 
to give full measure to the challenge problem, we will need to prove 
Minkowski’s conjecture. 


THE STRANGE POWER OF LIMITED POSSIBILITIES 


There is an element of hubris to taking up a problem that defeated 
Minkowski, but there are times when hubris pays off. Ironically, there are 
even times when we can draw strength from the fact that we have very 
few ideas to try. Here, for example, we know so few ways to construct 
nonnegative polynomials that we have little to lose from seeing where 
those ways might lead. Most of the time, such explorations just help 
us understand a problem more deeply, but once in a while, a fresh, 
elementary approach to a difficult problem can lead to a striking success. 


Wuat ARE Our OPTIONS? 


How can we construct a nonnegative polynomial? Polynomials that 
are given to us as sums of squares of real polynomials are always nonneg- 
ative, but such polynomials cannot help us with Minkowski’s conjecture. 
We might also consider the nonnegative polynomials that one finds by 
squaring both sides of Cauchy’s inequality and taking the difference, but 
Lagrange’s identity tells us that this construction is also doomed. Fi- 
nally, we might consider those polynomials that the AM-GM inequality 
tells us must be nonnegative. For the moment this is our only feasible 
idea, so it obviously deserves a serious try. 


THE AM-GM PLAN 


We found earlier that nonnegative real numbers aj, d@2,...,@n, must 
satisfy the AM-GM inequality 
aj) +ag+---+a 
(tea, ze. (3.7) 


7 n 


and we can use this inequality to construct a vast collection of non- 
negative polynomials. Nevertheless, if we do not want to get lost in 
complicated examples, we need to limit our search to the very simplest 
cases. Here, the simplest choice for nonnegative a; and ag are a, = 2” 
and ag = y”; so, if we want to make the product a,a2a3 as simple as 
possible, we can take a3 = 1/x?y? so that a,aza3 just equals one. The 
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AM-GM inequality then tells us that 
LS 5a? +9? + 1/2?y?) 
and, after the natural simplifications, we see that the polynomial 
P(a,y) = cy? + 2?y4 — 327y? +1 


is nonnegative for all choices of real x and y; thus, we find our first 
serious candidate for such a polynomial that cannot be written in the 
form 


for some integer s. Now we only need to find some way to argue that the 
representation (3.8) is indeed impossible. We only have elementary tools 
at our disposal, but these may well suffice. Even a modest exploration 
shows that the representation (3.8) is quite confining. 

For example, we first note that our candidate polynomial P(x, y) has 
degree six, so none of the polynomials Q; can have degree greater than 
three. Moreover, when we specialize by taking y = 0, we find 


l= P(a, 0) = Qi (2,0) oe Q3(z, 0) A eae Me Q3(z,0), 
while by taking x7 = 0, we find 
l= P(0,y) = Qi (0, y) We Q3(0, y) i aa aaa Q2 (0,4), 


so both of the univariate polynomials Q?(x,0) and Q2(0,y) must be 
bounded. From this observation and the fact that each polynomial 
Q,(z,y) has degree not greater than three, we see that they must be of 
the form 


Qr(x,y) = an + byry + cpa?y + dyxy? (3.9) 


for some constants az, bp, ce, and dx. 

Minkowski’s conjecture is now on the ropes; we just need to land a 
knock-out punch. When we look back at our candidate P(x, y), we see 
the striking feature that all of its coefficients are nonnegative except for 
the coefficient of x7y? which is equal to —3. This observation suggests 
that we should see what one can say about the possible values of the 
coefficient of xy? in the sum Q3(a, y) + Q3(a, y) +--+ Q2(z,y). 

Here we have some genuine luck. By the explicit form (3.9) of the 
terms Q;(z,y), 1 < k < s, we can easily check that the coefficient 
of xy? in the polynomial Q?(a,y) + Q3(z,y) +++: + Q?(z,y) is just 
b} + b2 +---+b?. Since this sum is nonnegative, it cannot equal —3, 
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and, consequently, the nonnegative polynomial P(x, y) cannot be writ- 
ten as a sum of squares of real polynomials. Remarkably enough, the 
AM-GM inequality has guided us successfully to a proof of Minkowski’s 
conjecture. 


SOME PERSPECTIVE ON MINKOWSKI’S CONJECTURE 


We motivated Minkowski’s conjecture by our exploration of Lagrange’s 
identity, and we proved Minkowski’s conjecture by making good use of 
the AM-GM inequality. This is a logical and instructive path. Never- 
theless, it strays a long way from the historical record, and it may leave 
the wrong impression. 

While it is not precisely clear what led Minkowski to his conjecture, he 
was most likely concerned at first with number theoretic results such as 
the classic theorem of Lagrange which asserts that every natural number 
may be written as the sum of four or fewer perfect squares. In any 
event, Minkowski brought his conjecture to David Hilbert, and in 1888, 
Hilbert published a proof of the existence of nonnegative polynomials 
that cannot be written as a sum of the squares of real polynomials. 
Hilbert’s proof was long, subtle, and indirect. 

The first explicit example of a nonnegative polynomial that cannot be 
written as the sum of the squares of real polynomials was given in 1967, 
almost eighty years after Hilbert proved the existence of such polynomi- 
als. The explicit example was discovered by T.S. Motzkin, and he used 
precisely the same AM-GM technique described here. 


HILBERT’S 17TH PROBLEM 


In 1900, David Hilbert gave an address in Paris to the second Inter- 
national Congress of Mathematicians which many regard as the most 
important mathematical address of all time. In his lecture, Hilbert de- 
scribed 23 problems which he believed to be worth the attention of the 
world’s mathematicians at the dawn of the 20th century. The prob- 
lems were wisely chosen, and they have had a profound influence on the 
development of mathematics over the past one hundred years. 

The 17th problem on Hilbert’s great list is a direct descendant of 
Minkowski’s conjecture, and in this problem Hilbert asked if every non- 
negative polynomial in n variables must have a representation as a sum 
of squares of ratios of polynomials. This modification of Minkowski’s 
problem makes all the difference, and Hilbert’s question was answered 
affirmatively in 1927 by Emil Artin. Artin’s solution of Hilbert’s 17th 
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problem is now widely considered to be one of the crown jewels of mod- 
ern algebra. 


EXERCISES 


Exercise 3.1 (A Trigonometric Path to Discovery) 
One only needs multiplication to verify the identity of Diophantus, 


(ayby + azb2)* = (az + az)(b} + b3) —. (arb = azb,)”, (3.10) 


yet multiplication does not suggest how such an identity might have 
been discovered. Take the more inventive path suggested by Figure 3.1 
and show that the identity of Diophantus is a consequence of the most 
the famous theorem of all, the one universally attributed to Pythagoras 
(circa 497 B.C.). 


The classic identity 

1 =cos?(a + 8) + sin?(a + 8) 
permits one to deduce that 
(aj + a3)(b7 + 63) equals 


(a1b1 + agbe)? + (aib2 — agb1). 


Fig. 3.1. In the right light, the identity (3.10) of Diophantus and the theorem 
of Pythagoras can be seen to be fraternal twins, though one is algebraic and 
the other geometric. 


Exercise 3.2 (Brahmagupta’s Identity) 

Brahmagupta (circa 600 A.D.) established an identity which shows 
that for any integer D the product of two numbers which can be written 
in the form a? — Db? with a,b € Z must be an integer of the same form. 
More precisely, Brahmagupta’s identity says 


(a? — Db*)(a? — DB?) = (aa + DB)? — D(a + ab). 
(a) Prove Brahmagupta’s identity by evaluating the product 


(a+ bVD)(a— bVD)(a + BVD)(a — BVD) 
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in two different ways. Incidentally, the computation is probably more 
interesting than you might guess. 

(b) Can you modify the pattern used to prove Brahmagupta’s identity 
to give another proof of the identity (3.10) of Diophantus? 


Exercise 3.3 (A Continuous Analogue of Lagrange’s Identity) 


Formulate and prove a continuous analogue of Lagrange’s identity. 
Next, show that your identity implies Schwarz’s inequality and finally use 
your identity to derive a necessary and sufficient condition for equality 
to hold. 


Exercise 3.4 (A Cauchy Interpolation) 
Show for 0 < a < 1 and for any pair of real vectors (a1, a2,...,@n) 
and (01, be,...,0n) that the quantity 


n n 2 
{ Doasbs +22 S- api} 
j=l 1<j<kSn 
is bounded above by the product 
{oe +20 OD awn} Se +20 bbs). (3.11) 
j=l 1<j<k<n j=l 1<j<kSn 


The charm of this bound is that for « = 0 it reduces to Cauchy’s in- 
equality and for 7 = 1 it reduces to the algebraic identity 


{(ay tag +++ +Gn)(b1 + bz +++ + bn) } 
= (a) + ag +++ a,)7(b) Hbo + +b, )?. 


Thus, we have an inequality that interpolates between two known results. 


Exercise 3.5 (Monotonicity and a Ratio Bound) 
Show that if f : [0,1] — (0,00) is nonincreasing, then one has 


fe 7 (a) de _ fo P(e 
T 
Jo vf ~ Se £ 
As a hint, one might consider the possibility of proving a Lagrange 
type identity by beginning with a double integral on [0,1] x [0,1] whose 
integrand is guaranteed to be positive by our monotonicity hypothesis. 


(3.12) 
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Exercise 3.6 (Monotonicity of the Product Defect) 
Show that for a pair of monotone sequences 0 < a; < a2 <--- and 
0<b; <bo <--- the quantities defined by 


Dp ay abe Sab, forma, 2 5.: (3.13) 
j=l j=l j=l 


are also monotone nondecreasing. Specifically, show that for each integer 
n=0,1,... one has D, < Dy41. 


Exercise 3.7 (The Four-Letter Identity via Polarization) 
For any real numbers a;,b;,8; and t;, 1 < 7 <n, there is an identity 
due independently to Binet and Cauchy which states that 


n n n n 

S © 4585 5 bjt —S 7 abs > s5ty = S- (a; bx — bjGx)(sjtk — Set). 
j=l j=l j=l j=l 1<j<k<n 
This generalizes Lagrange’s identity, as one can check by setting s; = 6; 
and t; = a;, but it is much more informative to know that the Cauchy— 
Binet identity may be obtained as a corollary of the much simpler result 
of Lagrange. 

In fact, the passage is quite straightforward, provided one knows how 
to exploit the polarization transformation 


This transformation carries the function u +> u? into the two-variable 
function (u,v) > wv, and it is devilishly effective at morphing identities 
with squares into new ones where the squares are replaced by products. 

To see how this works, check that the four-variable identity follows 
from the two-variable Lagrange identity after two sequential polariza- 
tions. To keep your calculation tidy, you may want to use the shorthand 


a p 
y 6 
and the easily verified identity that follows from the definition (3.14), 
a’ B 

yo Ol 


= ad — By (3.14) 


(3.15) 


ata Bl 
ye Ol, 


a p 
y oft 


This shorthand recalls the notation for the determinant of a two-by-two 
matrix, but to solve this problem one does not need to know more about 
determinants than the two self-evident relations (3.14) and (3.15). 
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Exercise 3.8 (Milne and Gauges of Proportionality) 
We have seen that the form 


i ; = S (aid; — ajb,)? 


provides a natural measure of proportionality for the pair of vectors 
(a1, @2,.--,@n) and (bj, b2,...,b,), but one can think of other measures 
of proportionality that are just as reasonable. For example, if we restrict 
our attention to vectors of positive terms, then one might equally well 
use the self-normalized sum 


n 


1 = mene sats ; (3.16) 


Develop an identity ee R that will permit you to prove the 
inequality of E.A. Milne: 


{Susoh{s eo h<{sah{so} ei 


j=1 j=l 


Next, use your identity to show that one has equality in the bound 
(3.17) if and only if the vectors (a1, @2,...,@n) and (b1,b2,...,bn) are 
proportional. Incidentally, the bound (3.17) was introduced by Milne 
in 1925 to help explain the biases inherent in certain measurements of 
stellar radiation. 


A 


On Geometry and Sums of Squares 


John von Neumann once said, “In mathematics you don’t understand 
things, you just get used to them.” The notion of n-dimensional space 
is now an early entrant in the mathematical curriculum, and few of us 
view it as particularly mysterious; nevertheless, for generations before 
ours this was not always the case. To be sure, our experience with the 
Pythagorean theorem in R? and R?° is easily extrapolated to suggest 
that for two points x = (x1,72,...,%4) and y = (y1, y2,---, Ya) in R? 
the distance p(x, y) between x and y should be given by 


p(x,y) = V (ys — 21)? + (y2 — 2)? +--+ + (Ya - Ba)?, (4.1) 


but, despite the familiarity of this formula, it still keeps some secrets. 
In particular, many of us may be willing to admit to some uncertainty 
whether it is best viewed as a theorem or as a definition. 

With proper preparation, either point of view may be supported, al- 
though the path of least resistance is surely to take the formula for 
p(x,y) as the definition of the Euclidean distance in R¢. Nevertheless, 
there is a Faustian element to this bargain. 

First, this definition makes the Pythagorean theorem into a bland 
triviality, and we may be saddened to see our much-proved friend treated 
so shabbily. Second, we need to check that this definition of distance 
in R? meets the minimal standards that one demands of a distance 
function; in particular, we need to check that p satisfies the so-called 
triangle inequality, although, by a bit of luck, Cauchy’s inequality will 
help us with this task. Third, and finally, we need to test the limits on 
our intuition. Our experience with R? and R? is a powerful guide, yet 
it can also mislead us, and one does well to develop a skeptical attitude 
about what is obvious and what is not. 

Even though it may be a bit like having dessert before having dinner, 
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In R?, one places a unit circle in 


each quadrant of the square [—2, 2]. 


A non-overlapping circle of maximal 


radius is then centered at the origin. 


-2 
Fig. 4.1. This arrangement of 5 = 2? + 1 circles in [2,2]? has a natural 
generalization to an arrangement of 2¢ + 1 spheres in [—2,2]¢. This general 
arrangement then provokes a question which a practical person might find 
perplexing — or even silly. Does the central sphere stay inside the box [—2, 2]? 
for all values of d? 


we will begin with the third task. This time the problem that guides us 
is framed with the help of the arrangement of circles illustrated in Figure 
4.1. This simple arrangement of 5 = 2? +1 circles is not rich enough to 
suggest any serious questions, but it has a d-dimensional analog which 
puts our intuition to the test. 


ON AN ARRANGEMENT IN R¢ 


Consider the arrangement where for each of the 2% points denoted 
by e = (€1,€2,.-.,€a) with e, = 1 or e, = —1 for all 1 < k < d, we 
have a sphere S, with unit radius and center e. Each of these spheres 
is contained in the cube [—2, 2]? and, to complete the picture, we place 
a sphere S(d) at the origin that has the largest possible radius subject 
to the constraint that S(d) does not intersect the interior of any of the 
initial collection of 24 unit spheres. We then ask ourselves a question 
which no normal, sensible person would ever think of asking. 


Problem 4.1 (Thinking Outside the Box) 
Is the central sphere S(d) contained in the cube [—2,2]@ for alld > 2? 


Just posing this question provides a warning that we should not trust 
our intuition here. If we rely purely on our visual imagination, it may 
even seem silly to suggest that S(d) might somehow expand beyond the 
box [—2,2]%. Nevertheless, our visual imagination is largely rooted in 
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our experience with R? and R?, and this intuition can easily fail us in 
R?, d> 4. Instead, computation must be our guide. 

Here we first note that for each of the 24 outside spheres the corre- 
sponding center point e has distance Vd from the origin. Next, since 
each outside sphere has radius 1, we see by subtraction that the radius of 
the central sphere S(d) is equal to Vd—1. Thus, we find that for d > 10 
one has Vd—1 > 2, and, yes, indeed, the central sphere actually extends 
beyond the box [—2,2]¢. In fact, as d — oo the fraction of the volume 
of the sphere that is inside the box even goes to zero exponentially fast. 


REFINING INTUITION — FACING LIMITATIONS 


When one shares this example with friends, there is usually a brief mo- 
ment of awe, but sooner or later someone says, “Why should we regard 
this as surprising? Just look how far away the point e = (e1,€2,...,€a) 
is from the origin! Is it really any wonder that ....”. Such observations 
illustrate how quickly (and almost subconsciously) we refine our intu- 
ition after some experience with calculations. Nevertheless, if we accept 
such remarks at face value, it is easy to become overly complacent about 
the very real limitations on our physical intuition. 

Ultimately, we may do best to take a hint from pilots who train them- 
selves to fly safely through clouds by relying on instruments rather than 
physical sensations. When we work on problems in R¢, d > 4, we benefit 
greatly from the analogy with R? and R°, but at the end of the day, we 
must rely on computation rather than visual imagination. 


MEETING THE MINIMAL REQUIREMENTS 


The example of Figure 4.1 reminds us that intuition is fallible, but 
even our computations need guidance. One way to seek help is to force 
our problem into its simplest possible form, while striving to retain its 
essential character. Thus, a complex model is often boiled down to a 
simpler abstract model where we rely on a small set of rules, or axioms, 
to help us express the minimal demands that must be met. In this way 
one hopes to remove the influence of an overly active imagination, while 
still retaining a modicum of control. 

Our next challenge is to see how the Euclidean distance (4.1) might 
pass through such a logical sieve. Thus, for a moment, we consider an 
arbitrary set S and a function p: S x S — R that has the four following 
properties: 


(i) p(x,y) > 0 for all x,y in S, 
(ii) p(x,y) = 0 if and only ifx =y, 
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(iii) p(x, y) = e(y,x) for all x,y in S, and 
(iv) p(x, y) < p(x,z) + p(z,y) for all x,y and z in S. 


These properties are intended to reflect the rock-bottom minimal re- 
quirements that p(-,-) must meet for us to be willing to think of p(x, y) 
as the distance from x to y in S. A pair (S,) with these properties 
is called a metric space, and such spaces provide the simplest possible 
setting for the study of problems that depend only on the notion of 
distance. 

When we look at the Euclidean distance p defined by the formula 
(4.1), we see at a glance that properties (i)—(iii) are met. It is perhaps 
less evident that property (iv) is also satisfied, but the next challenge 
problem invites one to confirm this fact. The challenge is easily met, 
yet along the way we will find a simple relationship between the triangle 
inequality and Cauchy’s inequality that puts Cauchy’s inequality on a 
new footing. Ironically, the axiomatic approach to Euclidean distance 
adds greatly to the intuitive mastery of Cauchy’s inequality. 


Problem 4.2 (Triangle Inequality for Euclidean Distance) 
Show that the function p: R¢ x R4 = R defined by 


p(x,y) = V (ys — 21)? + (yo — 22)? +++ + (ya - 2a)? (4.2) 


satisfies the triangle inequality 


p(x, y) < p(x,z) + p(z,y) for all x,y andz in R’, (4.3) 


To solve this problem, we first note from the definition (4.2) of p that 
one has the translation property that p(x + w,y + w) = p(x,y) for all 
w € R?@; thus, to prove the triangle inequality (4.3), it suffices to show 
that for all u and v in R@ one has 


p(0,u+v) < p(0,u) + p(u,u+ v) = p(0,u) + p(0,v). (4.4) 


By squaring this inequality and applying the definition (4.2), we see that 
the target inequality (4.3) is also equivalent to 


d d d 1/27 _d 1/2 d 
2 2 2 
Lotorsdgra al [oaeh +g 
j=l j=l j=l j=l j=l 
and this in turn may be simplified to the equivalent bound 


d d 1/2~ d 1/2 
Dus < {yal {oe} 
j=l 


j=l 
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Thus, in the end, one finds that the triangle inequality for the Euclidean 
distance is equivalent to Cauchy’s inequality. 


SOME NOTATION AND A MODEST GENERALIZATION 


The definition (4.2) of p can be written quite briefly with help from the 
standard inner product (u,v) = uyv2+ugve+--+-+ugva, and, instead of 
(4.2), one can simply write p(x, y) = (y-— x,y — x)2. This observation 
suggests a generalization of the Euclidean distance that turns out to 
have far reaching consequences. 

To keep the logic of the generalization organized in a straight line, we 
begin with a formal definition. If V is a real vector space, such as R4%, 
we say that the function from V to Rt defined by the mapping v + ||| 
is a norm on V provided that it satisfies the following properties: 


(i) ||v|| = 0 if and only if v = 0, 
(ii) ||av|| = Ja|||v|| for all a € R, and 
(iii) |Ju + v]| < |Jul| + ||v|] for all u and v in V. 


Also, if V is a vector space and || - || is a norm on V, then the couple 
(V, || - ||) is called a normed linear space. The arguments of the preced- 
ing section can now be repeated to establish two related, but logically 
independent, observations: 


i 


(I). If (V,(-,-)) is an inner product space, then ||v|| = (v,v)? defines a 
norm on V. Thus, to each inner product space (V, (-,-)) we can associate 
a natural normed linear space (V, || - ||). 


(II). If (V, || - ||) is a normed linear space, then p(x, y) = ||x — y|| defines 
a metric on V. Thus, to each normed linear space we can associate a 
natural metric space (V, p(-,-)). 


Here one should note that the three notions of an inner product space, 
a normed linear space, and a metric space are notions of strictly increas- 
ing generality. The space S with just two points x and y where p is 
defined by setting p(x,x) = p(y,y) = 0 and p(x,y) = 1 is a metric 
space, but it certainly is not an inner product space — the set S is not 
even a vector space. Later, in Chapter 9, we will also meet normed linear 
spaces that are not inner product spaces. 


How Mucu INTUITION? 


According to an old (and possibly apocryphal) story, during one of 
his lectures David Hilbert once wrote a line on the blackboard and said, 
“Tt is obvious that ...,” but then Hilbert paused and thought for a 
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moment. He then became noticeably perplexed, and he even left the 
room, returning only after an awkward passage of time. When Hilbert 
resumed his lecture, he began by saying “It is obvious that ....” 

One of the tasks we assign ourselves as students of mathematics is to 
sort out for ourselves what is obvious and what is not. Oddly enough, 
this is not always an easy task. In particular, if we ask ourselves if the 
triangle inequality is obvious in R@ for d > 4, we may face a situation 
which is similar to the one that perplexed Hilbert. 

The very young child who takes the diagonal across the park shows an 
intuitive understanding of the essential truth of the triangle inequality in 
R?. Moreover, anyone with some experience with R¢ understands that 
if we ask a question about the relationship of three points in R¢, d > 3, 
then we are “really” posing a problem in the two-dimensional plane that 
contains those points. These observations support the assertion that the 
triangle inequality in R® is obvious. 

The triangle inequality is indeed true in R%, so one cannot easily refute 
the claim of someone who says that it is flatly obvious. Nevertheless, 
algebra can be relied upon in ways that geometry cannot, and we already 
know from the example of Figure 4.1 that our experience with R? can be 
misleading, or at least temporarily misleading. Sometimes questions are 
better than answers and, for the moment at least, we will let the issue of 
the obviousness of the triangle inequality remain a part of our continuing 
conversation. A more pressing issue is to understand the distance from 
a point to a line. 


A CLOSEST POINT PROBLEM 


For any point x 4 0 in R@ there is a unique line £ through x and the 
origin 0 € R®, and one can write this line explicitly as £ = {tx : t € R}. 
The closest point problem is the task of determining the point on £ that 
is closest to a given point v € R?. By what may seem at first to be 
very good luck, there is an explicit formula for this closest point that 
one may write neatly with help from the standard the inner product 
(v,X) = vpwy + vows +--+ + U_Wn. 


Problem 4.3 (Projection Formula) 


For each v and each x # 0 in R®, let P(v) denote the point on the 
line L= {tx:t © R} that is closest to v. Show that one has 


(4.5) 
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The point P(v) € £ is called the projection of v on £, and the formula 
(4.5) for P(v) has many important applications in statistics and engi- 
neering, as well as in mathematics. Anyone who is already familiar with 
a proof of this formula should rise to this challenge by looking for a new 
proof. In fact, the projection formula (4.5) is wonderfully provable, and 
successful derivations may be obtained by calculus, by algebra, or even 
by direct arguments which require nothing more than a clever guess and 
Cauchy’s inequality. 


A LOGICAL CHOICE 


The proof by algebra is completely elementary and relatively uncom- 
mon, so it seems like a logical choice for us. To find the value of t € R 
that minimizes p(v,tx), we can just as easily try to minimize its square 


p*(v, tx) = (v — tx, v — tx), 


which has the benefit of being a quadratic polynomial in t. If we look 
back on our earlier experience with such polynomials, then we will surely 
think of completing the square, and by doing so we find 


(v — tx, v — tx) = (v,v) — 2t(v,x) + t?(x,x) 


= x (2 mt) 


(x, x) : (x, x) 
sl (Ga) tee mal 
Thus, in the end, we see that p?(v,tx) has the nice representation 
woof (BY tor) ay 


From this formula we see at a glance that p(v, tx) is minimized when we 
take t = (v,x)/(x,x), and since this coincides exactly with the asser- 
tion of projection formula (4.5), the solution of the challenge problem is 


complete. 
AN ACCIDENTAL COROLLARY — CAUCHY—SCHWARZ AGAIN 


If we set t = (v,x)/(x,x) in the formula (4.6), then we find that 


(v,v)(x,x) — (v,x)? 


(x, x) 


sa 
t = 
min p”(v, £x) 


(4.7) 


and, since the left-hand side is obviously nonnegative, we discover that 
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L={tx:teER} 


P(v) = tx 


v—P(v)=r es where t = x(v, x) /(x, x) 


Fig. 4.2. The closest point on the line £ to the point to v € R®@ is the point 
P(v). It is called the projection of v onto £, and either by calculus, or by 
completing of the square, or by direct arguments using Cauchy’s inequality, 
one can show that P(v) = x(x,v)/(x,x). One way to characterize the pro- 
jection P(v) is that it is the unique element of £ such that r = v — P(v) is 
orthogonal to the vector x which determines the line CL. 


our calculation has provided a small unanticipated bonus. The numer- 
ator on the right-hand side of the identity (4.7) must also be positive, 
and this observation gives us yet another proof of the Cauchy—Schwarz 
inequality. 

There are even two further benefits to the formula (4.7). First, it 
gives us a geometrical interpretation of the defect (v,v)(x,x) — (v,x)?. 
Second, it tells us at a glance that one has (v, v)(x,x) = (v,x)?, if and 
only if v is an element of the line £ = {tx : t € R}, which is a simple 
geometric interpretation of our earlier characterization of the case of 
equality. 


How TO GUESS THE PROJECTION FORMULA 


Two elements x and y of an inner product space (V, (-,-)) are said to be 
orthogonal if (x,y) = 0, and one can check without difficulty that if (-, -) 
is the standard inner product on R? or R°, then this modestly abstract 
notion of orthogonality corresponds to the traditional notion of orthog- 
onality, or perpendicularity, which one meets in Euclidean geometry. If 
we combine this abstract definition with our intuitive understanding of 
R?, then, almost without calculation, we can derive a convincing guess 
for a formula for the projection P(v). 

For example, in Figure 4.2 our geometric intuition suggests that it is 
“obvious” (that tricky word again!) that if we want to choose ¢ such 
that P(v) is the closest point to v on £, then we need to choose t so 
that the line from P(v) to v should be orthogonal to the line £. In 
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symbols, this means that we should choose t such that 
(x,v—tv)=0 or t= (x,v)/(x,x). 


We already know this is the value of t which yields the projection formula 
(4.5), so — this time at least — our intuition has given us good guidance. 

If we are so inclined, we can even turn this guess into a proof. Specif- 
ically, we can use Cauchy’s inequality to prove that this guess for t is 
actually the optimal choice. Such an argument provides us with a sec- 
ond, logically independent, derivation of the projection formula. This 
would be an instructive exercise, but, it seems better to move directly 
to a harder challenge. 


REFLECTIONS AND PRODUCTS OF LINEAR FORMS 


The projection formula and the closest point problem provide us with 
important new perspectives, but eventually one has to ask how these 
help us with our main task of discovering and proving useful inequalities. 
The next challenge problem clears this hurdle by suggesting an elegant 
bound which might be hard to discover (or to prove) without guidance 
from the geometry of R”. 


Problem 4.4 (A Bound for the Product of Two Linear Forms) 

Show that for all real uj, vj, and xj, 1 <j <n, one has the following 

upper bound for a product of two linear forms: 

. . 1,< ay 1/2 9)1/24 WO 2 
Dowty De S sD wt (D5) (99) FD a. (4.8) 
j=l j=l j=l j=l j=l j=l 

The charm of this inequality is that it leverages the presence of two 

sums to obtain a bound that is sharper than the inequality which one 
would obtain from two applications of Cauchy’s inequality to the individ- 
ual multiplicands. In fact, when (u,v) < 0 the new bound does better by 
at least a factor of one-half, and, even if the vectors u = (u1, U2,..., Un) 
and v = (v1, V2,---,;Un) are proportional, the bound (4.8) is not worse 
than the one provided by Cauchy’s inequality. Thus, the new inequality 
(4.8) provides us with a win-win situation whenever we need to estimate 
the product of two sums. 


FOUNDATIONS FOR A PROOF 


This time we will take an indirect approach to our problem and, at 
first, we will only try to deepen our understanding of the geometry 
of projection on a line. We begin by noting that Figure 4.2 strongly 
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suggests that the projection P onto the line £ = {tx : t € R}, must 
satisfy the bound 


\|P(v)|| < lvl] for all ve R@ (4.9) 


and, moreover, one even expects strict inequality here unless v € CL. 
In fact, the proof of the bound (4.9) is quite easy since the projection 
formula (4.5) and Cauchy’s inequality give us 

(x, v) 1 
(x, x) 


= pyleswll < Iv 


IPCw lh = fp 


FROM PROJECTION TO REFLECTION 


We also face a similar situation when we consider the reflection of the 
point v through the line £, say as illustrated by Figure 4.3. Formally, 
the reflection of the point v in the line £ is the point R(v) defined by 
the formula R(v) = 2P(v) — v. In some ways, the reflection R(v) is an 
even more natural object than the projection P(v). In particular, one 
can guess from Figure 4.2 that the mapping R: V — V has the pleasing 
length preserving property 


|R(v)|| = ||v|| for all v E R¢. (4.10) 
One can prove this identity by a direct calculation with the projection 


formula, but that calculation is most neatly organized if we first observe 
some general properties of P. In particular, we have the nice formula 


— /(%&,Vv)x (x, Vv)x (x,v)? 
(Pw).Pee) = (Soe oe) 


while at the same time we also have 


(Piv)y) = (Sav) = Soak 


I[x| 


Will?’ 


so we may combine these observations to obtain 
(P(v), P(v)) = (P(v),v)- 


This useful identity now provides a quick confirmation of the length- 
preserving (or isometry) property of the reflection R; we just expand 
the inner product and simplify to find 
|R(v) |? = 2P(v) — v,2P(v) — v) 
= 4(P(v), P(v)) — 4(P(v), v) + (v, v) 


= (v,Vv). 
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Riv L=({tx:te€R} 


R(v) is the reflection of the 
point v € R” in the line £ 


Fig. 4.3. When the point v is reflected in the line £ one obtains a new point 
R(v) which is the same distance from the origin as v. More formally, the 
reflection of v is the point R(v) defined by the formula R(v) = 2P(v) — v. 
One can then use the projection formula for P to prove that ||R(v)|| = ||v]|. 


RETURN TO THE CHALLENGE 


The geometry of the reflection through the line £ = {tx : t € R} 
is easily understood, but sometimes the associated algebra can offer a 
pleasant surprise. For example, the isometry property of the reflection 
R and the Cauchy—Schwarz inequality can be combined to provide an 
almost immediate solution of our challenge problem. 

From the Cauchy—Schwarz inequality and the isometry property of 
the reflection R we have the bound 


(R(u),v) < ||R(u)IIlvil < [allliv (4.11) 


while on the other hand, the definition of R and the projection formula 
give us the identity 


(R(u),v) = (2P(u) — u,v) = 2(P(u), v) — (u,v) 


Thus, from Cauchy—Schwarz and the isometry bound (4.11) we have 


epee) — (u,v) < [lulliivil 


and this may be arranged more naturally as 
1 
(x, a) (xv) <5 ((u,¥) + |lullllvll) ll’: (4.12) 


If we now interpret these inner products as the standard inner products 
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on R”, then we see that the bound (4.12) is precisely the inequality (4.8) 
of the challenge problem. 

Thus, almost by accident, we find that the geometry of reflection has 
brought us to a new and informative refinement of Cauchy’s inequal- 
ity. Such accidents are common, and they form a thread from which 
Scheherazade could spin a thousand tales, all with the name symme- 
try and its applications. We will revisit this theme, but first we seek a 
different kind of contribution from a different kind of geometry. 


THE LIGHT CONE INEQUALITY 


The preceding examples suggested how Euclidean geometry helps to 
deepen our understanding of the theory of inequalities, but the tradi- 
tional geometry of Euclid is not the only one that helps in this way. 
Other geometries, or geometric models, can do their part. 

One especially attractive example calls on the famous space-time ge- 
ometry of Einstein and Minkowski. The physical background of this 
model is not needed here, but, for motivation, it is useful to recall one 
fundamental principle of special relativity: no information of any kind 
can travel faster than the speed of light. 

If we scale space so that the speed of light is 1, this principle tells 
implies that each point x = (t;21,%2,...,%a) of time and space where 
one can have knowledge of an event that takes place at the origin at 
time 0 must satisfy the bound 


Vaptaete +03 St. (4.13) 


The set C of all such points in Rt+x R¢ is called Minkowski’s light cone, 
and it is illustrated in Figure 4.4. 

The only further notion that we need is the Lorentz product, which is 
the bilinear form defined for pairs of elements x = (¢; 21, 2,...,@q) and 
y = (u; 1, y2,---, Ya) in the light cone C' by the formula 


[x,y] = tu — riy1 — ayo — ++ — LaYya- (4.14) 


This quadratic form was introduced by the Dutch physicist Hendrick 
Antoon Lorentz (1853-1928), who used it to simplify some of the formu- 
las of special relativity, but for us the interesting feature of the Lorentz 
product is its relationship to the Cauchy—Schwarz inequality. It turns 
out that the Lorentz product satisfies an inequality which has a su- 
perficial resemblance to the Cauchy—Schwarz inequality, except for one 
remarkable twist — the inequality is exactly reversed! 
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Minkowski’s Light Cone 


C= { (tins atay. sa) : (af tah +--+ 29)? < i} 


Fig. 4.4. Minkowski’s light cone C is the region of space-time R* x R? where 
one can have knowledge of an event that takes place at the origin at time zero. 
Here time is scaled so that the speed of light is equal to one. 


Problem 4.5 (Light Cone Inequality) 

Show that if x and y are points of R+ x R® that are elements of the 
light cone C' defined in Figure 4.4, then the Lorentz product satisfies the 
inequality 


NF 


x, x]2[y,y]? < [x,y]. (4.15) 


Show, moreover, that if x = (t;21,%2,...,%n) and y = (us yi, Y2,---;Yn) 
then the inequality (4.15) is strict unless ux; = ty; for alll <j <d. 


DEVELOPMENT OF A PLAN 


If the Cauchy-Schwarz Master Class were to have a final exam, then 
the light cone inequality would provide fertile ground for the develop- 
ment of good problems. One can prove the light cone inequality with 
almost any reasonable tool — induction, the AM-GM inequality, or even 
a Lagrange-type identity will do the job. Here we will explore a lazy and 
devious route, precisely the kind favored by most mathematicians. 

Since our goal is to prove a reversal of the Cauchy—Schwarz inequality, 
a pleasantly outrageous plan would be to look for some way to invert the 
famous polynomial argument of Schwarz (say, as described in Chapter 1, 
on page 11). In Schwarz’s argument, one constructs a quadratic polyno- 
mial, makes an observation about its roots, and then draws a conclusion 
about the coefficients of the polynomial. That is just what we will try 
here — with some necessary changes. After all, we want a different con- 
clusion about the coefficients, so we need to make a different observation 
about the roots. 

In imitation of Schwarz’s argument, we introduce the quadratic poly- 
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p(A) A Parabola with both 
negative and positive 
values must have two 
real roots. 


Fig. 4.5. Schwarz’s proof of the Cauchy—Schwarz inequality exploited the 
bound on the coefficients of a polynomial without real roots; in contrast, 
Minkowski’s light cone inequality cone exploits the information that one gets 
from knowing a quadratic polynomial has two real roots. 


nomial 
p(d) = [k — Ay, x — Ay] = [x, x] — 2A[x, y] + A’ Ly, y] (4.16) 
d 
= (t— Au)? — D(a; — ryj)?, (4.17) 


and we immediately address ourselves to understanding its roots. To 
side-step trivialities, we first note that if ¢ = 0 then our assumption that 
x = (t;%1,%2,...,0a) € C tells us that x = 0. In this case, the light 
cone inequality (4.15) is trivially true, so, without loss of generality, we 
can assume that t £ 0. 

Next, for space-time vectors x and y in C one sees from Cauchy’s in- 
equality and the definition of the light cone that the spatial components 
(@1,22,...,%n) and (y1, y2,---,Yn) must satisfy the bound 


In the language of the Lorentz product, this says [x,y] > 0, and asa 
consequence we see that the light cone inequality is trivially true when- 
ever [x,x] = 0 or [y, y] = 0. Thus, without loss of generality, we can 
assume both of these Lorentz products are nonzero. 

Now, we are ready for the main argument. For u # 0, we may then 
take Ag = t/u, and the first sum of the expanded polynomial (4.17) 
vanishes. We then see that either (i) ua; = ty; for alll < j < dor 
else we have (ii) p(Ap) < 0. In the first situation, we have the case of 
equality which was suggested by the challenge problem, so to complete 
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the solution we just need to confirm that in the second situation we have 
required strict inequality. 

Since we have assumed that [y, y] > 0, we see from the product form 
(4.16) that p(A) — co as A > co or \ > —o0 and we know that p(\o) < 0 
so the equation p(A) = A\? +2B\+ C = 0 must have two distinct real 
roots. The binomial formula for the quadratic equation then tells us that 
AC < B. When we identify the coefficients of p(A) from its product form 
(4.16) we find A = [x,x], B = [x,y], and C = [y, y], so AC < B gives 
us the strict inequality [x,x][y,y] < [x,y]?, which we hoped to show. 


COMPLEX INNER PRODUCT SPACES 


If V is a complex vector space, such as C4 or the set of complex valued 
continuous functions on [0,1], then we say that a function on V x V 
defined by the mapping (a, b) +> (a,b) € C is an complex inner product 
and we say that (V,(-,-)) is a complex inner product space provided 
that the pair (V, (-,-)) has five basic properties. The first four of these 
perfectly parallel those required of a real inner product space: 


>0 forallv eV, 
=0 ifand only ifv =0 
av,w) =a(v,w) for alla € Cand v,we V, 


u,v +w) =(u,v)+(u,w) foralluyvandweV, 


but the fifth property requires a modest change; specifically, for a com- 
plex inner product space we assume that 


(v) (v,w) = (w,v) forallv,weV. 


Problem 4.6 (Cauchy—Schwarz for a Complex Inner Product) 


Show that in a complex inner product space (V, (, -)) one has 


NR 
NI 


l(v, w)| < (v,v)2(w,w)?. (4.18) 


Furthermore, show that v 4 0 then one has equality in the bound (4.18) 
if and only if w = Av for some \ EC. 


A NATURAL PLAN AND A NEW OBSTACLE 


A natural plan for proving the Cauchy—Schwarz inequality for a com- 
plex inner product space is to mimic the proof for a real inner product 
space while paying attention to any changes which may be required by 
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the new “property (v).” Thus, we compute 
O< (v—w,v — w) — (v,v) + (w, w) -_ (v,w) _ (w,v) 


= (v,v) + (w,w) — {(v,w) + (v,w)} 


= (v,v) + (w,w) — 2Re(v,w), 


and we deduce that 
1 1 
Re (v, w) < 5 (Vv) an 2 
where we have strict inequality unless v = w. 
The additive bound (4.19) must be converted to one that is multi- 
plicative. If we call on the familiar normalization method and introduce 


(w,w), (4.19) 


v= v/(v,v)? and w= w/(w,w)?, 
then arithmetic brings us quickly to the bound 
Re (v,w) < (v,v)?(w,w)?. (4.20) 


Unfortunately, this starts to look worrisome. We hoped to obtain a 
bound on |(v,w)| but we have only found bound on Re (v,w), a term 
which may be arbitrarily smaller than |(v,w)|. Is it possible that this 
approach has failed? 


SAVED BY A SELF-IMPROVEMENT 


The saving grace of inequality (4.20) is that it is of the self-improving 
kind. If we exploit its generality appropriately, we can derive an appar- 
ently stronger inequality. 

If we write (v, w) = pe’ with p > 0 and if we set V = e~“°v, then the 
properties of the complex inner product give us the identities 


(v,v) =(v,v) and (v,w) = Re(¥,w) = |(v,w)|, 
so the real part bound (4.20) for the pair v and w gives us 
|(v,w)| = Re (¥, w) < (¥, 7) 2 (w, w)? = (v,v)2(w, w)?2. 


The outside terms yield the complex Cauchy—Schwarz inequality in the 
precisely the form we expected, so the bound (4.20) was strong enough 
after all. 


THE TRICK OF “MAKING IT REAL” 


In this argument, we faced an inequality which was made more compli- 
cated because of the presence of a real part. This is a common difficulty, 
and it is often addressed by the trick used here: one pre-multiplies by 
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a well-chosen complex number in order to guarantee that some critical 
quantity will be real. This is one of the most widely used maneuvers 
in the theory of complex inequalities, and it should never be far out of 
mind. 

Finally, to complete the solution of the challenge problem, we should 
confirm the alleged necessary and sufficient conditions for equality. Here 
it is honestly easy to retrace the steps of our argument to confirm the 
stated conditions, but, as we will discuss later (page 138), such backtrack 
arguments are not always trouble free. For the standard complex inner 
product one also has another option which is perhaps more satisfying; 
one can simply use the complex Lagrange identity (4.23) as suggested 
by Exercise 4.4. 


EXERCISES 


Exercise 4.1 (Triangle Inequality “Pot Shots”) 

The triangle inequality in R¢ may seem obvious, but some of its con- 
sequences can be puzzling when they are presented out of context. Here, 
the next three exercises are not at all hard, but you might ask yourself, 
“Would these have been so easy yesterday?” 


(a) Show for nonnegative x, y, z that 


(atyt2v2< Ver+yPtVy+ 2st Vor + 2. 
(b) Show for 0 <a <y< z that 

V+ 2 <av2+ Mya? +e ap. 

(c) Show for positive x, y, z that 


2/3 < Jar t+y2+ 224+ Jan? + y-2 4272. 


This list can be continued almost without limit, yet there is really only 
one theme: any time you see a sum of square roots in an inequality, 
you should give at least a moment’s thought to the possibility that the 
triangle inequality may help. 


Exercise 4.2 (The Geometry of “Steepest Ascent” ) 
If f : R” — R is a differentiable function, then one often hears that 
the gradient 


if Op OF of 
VIG) = & Fern ge) 
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y = y(t) 


y(t) = (v — tw, v — tw) 
ind (v, v) ~ 2t(v, w) + t(w, w) 


to= (v, w)/(w, w) 
Fig. 4.6. By calculus, or by completing the square, one finds that the quadratic 
polynomial P(t) = (v — tw, v — tw) takes it minimum at to = (v, w)/(w, w). 
The nonnegativity of h is enough to prove Cauchy’s inequality for the nth 
time, but geometry adds details which can be critical. 


points in the direction of steepest ascent of f provided that Vf #0. In 
longhand, this says for any unit vector u one has the bound 


d d 

_ < ; 
al + tu) oo ad + tv) Le (4.21) 
where v = Vf(x)/||Vf(x)||. Prove this inequality and show that it is 
strict unless u = v. 


Exercise 4.3 (Cauchy via Another Identity) 

Lagrange’s identity is not the only formula that gives an instant proof 
of Cauchy’s inequality. Check that in any real inner product space the 
difference (v,v)(w,w) — (v,w)? can be written as 


(w,w)<(v Away? w,v Wav) Ww), (4.22) 

(w, w) (w, w) 
and explain why this also implies the general Cauchy—Schwarz inequality. 
Incidentally, one does not need a flash of algebraic insight to discover 
the representation (4.22). As Figure 4.6 suggests, this formula cannot 


remain hidden for long once we ask ourselves about minimization of the 
polynomial P(t) = (v — tw, v — tw). 


Exercise 4.4 (Lagrange’s Identity for Complex Numbers) 
Prove that for complex a, and by, 1 < k <n, one has 
2 


Sax by| = >aO 9 |a; bi — Ak byl’, (4.23) 
k=1 


1<j<k<n 
and show that this identity yields the complex Cauchy inequality as 


n n 


Yo laxl? So [be? - 


k=1 k=1 
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B 


D 
Fig. 4.7. Ptolemy’s inequality and the condition for equality. 


well as the necessary and sufficient conditions for equality. Here one 
should note that this identity does not follow from direct substitution 
of complex numbers into the Lagrange’s identity for real numbers; those 
pesky absolute values get in the way. A slightly more sophisticated 
approach is required. 


Exercise 4.5 (A Vector-Scalar Melange) 

Consider real weights p; > 0, 7 = 1,2,...,n, arbitrary real numbers 
aj,j =1,2,...,n, and an inner product space (V, (-,-)). Find an analog 
of Lagrange’s identity which suffices to prove that one has the inequality 


n 2 n n 
S~ pja5x;|| < S > pja} S~ pallxell? (4.24) 
j=l j=l k=1 


for all xz, 1 <k <n, in V. Also, check that your identity implies that 
equality holds if and only if we have a;x, = a,x; for all 1 < j,k <n. 


Exercise 4.6 (Ptolemy’s Inequality) 

Ptolemy may be best known for founding a theory of planetary motion 
which was overturned by Copernicus, but parts of Ptolemy’s legacy have 
stood the test of time. Among these, Ptolemy has a namesake inequality 
which even today is a workhorse of the theory of geometric inequalities. 
Ptolemy’s inequality asserts that in a convex quadrilateral “the product 
of the diagonals is bounded by the sum of the products of the opposite 
sides,” or, in the notation of Figure 4.7, 


pq < ac + bd. (4.25) 


Prove this inequality and show that equality holds if and only if the four 
vertices A, B,C, D are all on the circumference of a circle. 
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Exercise 4.7 (Representations of Complex Inner Products) 


(a) If (-,-) is a complex inner product and if a € C and a¥ = 1 but 
a? #1, then show that one has the representation 


, Na , 
(ay) = +> |e arylPa" (4.26 
n=0 
where, as usual, ||w|| = (w, w)1/?. 


(b) Similarly show that for any complex inner product one has 


(x,y) = = | lla +e yl|"e% a0. (4.27) 
27 Ja 


One benefit of identities such as these is that they may help us convert 
facts for || - || into facts for (-,-) or vice-versa. One can say that these 
are “just” variants of the polarization identity, but there are times when 
they are just the variant one needs. 


Exercise 4.8 (A Concrete Model of an Abstract Space) 

If x1,X2,.--,Xn are linearly independent elements of the (real or com- 
plex) inner product space (V, (-,-)), we define a new sequence ej, €2,...,€n 
by setting ey = x;/||x;|| and by applying the two-part recursion 


k-1 

Zk = Xp — y (x,,e;)e; and e, = Teal (4.28) 
: k 
j=1 


for k = 2,3,...,n. This algorithm is known as the Gram-—Schmidt pro- 
cess, and it provides a systematic tool for reducing questions in an inner 
product space to questions for real or complex numbers. In this exercise 
we develop the most basic properties of this process, and in the next 
four exercises we show how these properties are used in practice. 

(a) Show that {e, : 1 < k < n} is an orthonormal sequence in the 
sense that for all 1 < j,k <n one has 


1 ifj=k 
(ej,€x) = Par 
0 ifgFk. 


(b) Show that {x,:1<k <n} and {e,:1<k < n} satisfy the 
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triangular system of linear relations 


xX, = (x1,e1)e) 


Xq = (K2,e1)e1 + (Xo, e2)e2 


Xn = (Xn, €1)e1 + (Kn, e2)e2 +--+ + (Kn, en)en. 


Exercise 4.9 (Gram—Schmidt Implies Cauchy—Schwarz) 

Apply the Gram—Schmidt process to the two term sequence {x,y} and 
show that it reduces the inequality |(x,y)| < (x,x)2(y,y)? to a bound 
that is obvious. Thus, the Gram—Schmidt process gives an automatic 
proof of the Cauchy—Schwarz inequality. 


Exercise 4.10 (Gram—Schmidt Implies Bessel) 

If {y, : 1 < k < n} is an orthonormal sequence from a (real or 
complex) inner product space (V, (-,-)), then Bessel’s inequality asserts 
that 


Soi X, yn) |" < (x, x) for all x € V. (4.29) 
k=1 


Show that the Gram—Schmidt process yields a semi-automatic proof of 
Bessel’s inequality. Incidentally, one should also note that the case n = 1 
of Bessel’s inequality is equivalent to the Cauchy—Schwarz inequality. 


Exercise 4.11 (Gram—Schmidt and Products of Linear Forms) 
Use the Gram—Schmidt process for the three-term sequence {x, y, z} 
to show that in a real inner product space one has 


(x, ¥)(x,z) S 5 (ys) + Ilyliflzil) Ibe, (4.30) 


a bound which we used earlier (page 61) to illustrate the use of isometries 
and projections. 


Exercise 4.12 (A Gram-—Schmidt Finale) 
Show that if x,y,z are elements of a (real or complex) inner product 
space V and if ||x|| = |ly|| = ||z|| = 1, then one has the inequality 


|(x,x)(y,Z) — (x,y) (x, Z)| 
< {(x,x)? — |(x,y) |? } { (x, x)? — |(x, 2) |? } (4.31) 
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and the inequality 


(x, x)? (I(y, 2)? + [(y, x)? + |(x, 2)? ) 
< (x,x)* + (x,x)(z,y)(y, x) (x, 2) 
+ (%,x)(y, 2) (x, y) (Z, X)- (4.32) 


At first glance, these bounds may seem intimidating, but after one uses 
the Gram-—Schmidt process to strip away the inner products, they are 
just like the kind of bounds we have met many times before. 


Exercise 4.13 (Equivalence of Isometry and Orthonormality) 

This exercise shows how an important algebraic identity can be proved 
with help from the condition for equality in the Cauchy—Schwarz bound. 
The task is to show that if the n x n matrix A preserves the Euclidean 
length of each v in R” then its columns are orthonormal. In the useful 
shorthand of matrix algebra, one needs to show 


|| Av|| = ||v|| forallveR? <—  ATA=T, 


where IJ is the identity matrix, A’ is the transpose of A, and ||v|| is the 
Euclidean length of v. 

As a hint, one might first show that || A? v|| < ||v||; that is, one might 
show that the transpose A? does not increase length. One can then 
argue that if Cauchy—Schwarz inequality is applied to the inner product 
(v, A? Av) then equality actually holds. 


5 


Consequences of Order 


One of the natural questions that accompanies any inequality is the 
possibility that it admits a converse of one sort or another. When we 
pose this question for Cauchy’s inequality, we find a challenge problem 
that is definitely worth our attention. It not only leads to results that 
are useful in their own right, but it also puts us on the path of one 
of the most fundamental principles in the theory of inequalities — the 
systematic exploitation of order relationships. 


Problem 5.1 (The Hunt for a Cauchy Converse) 
Determine the circumstances which suffice for nonnegative real num- 
bers ax, by, K=1,2,...,n to satisfy an inequality of the type 


(Scat) (Sout) <oYaan 2) 


k=1 


for a given constant p. 


ORIENTATION 


Part of the challenge here is that the problem is not fully framed — 
there are circumstances and conditions that remain to be determined. 
Nevertheless, uncertainty is an inevitable part of research, and practice 
with modestly ambiguous problems can be particularly valuable. 

In such situations, one almost always begins with some experimenta- 
tion, and since the case n = 1 is trivial, the simplest case worth study 
is given by taking the vectors (1,a) and (1,6) with a > 0 and b> 0. In 
this case, the two sides of the conjectured Cauchy converse (5.1) relate 
the quantities 


(1+a?)2(1+07)? and 1+ab, 
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and this calculation already suggests a useful inference. If a@ and b are 
chosen so that the product ab is held constant while a — oo, then 
one finds that the right-hand expression is bounded, but the left-hand 
expression is unbounded. This observation shows in essence that for a 
given fixed value of p > 1 the conjecture (5.1) cannot hold unless the 
ratios a,/b, are required to be bounded from above and below. 

Thus, we come to a more refined point of view, and we see that it is 
natural to conjecture that a bound of the type (5.1) will hold provided 
that the summands satisfy the ratio constraint 


m<i*<M for all k =1,2,...n, (5.2) 
k 


for some constants 0 <m< M < o. In this new interpretation of the 
conjecture (5.1), one naturally permits p to depend on the values of m 
and M, though we would hope to show that p can be chosen so that 
it does not have any further dependence on the individual summands 
ay, and by. Now, the puzzle is to find a way to exploit the betweenness 
bounds (5.2). 


EXPLOITATION OF BETWEENNESS 


When we look at our unknown (the conjectured inequality) and then 
look at the given (the betweenness bounds), we may have the lucky 
idea of hunting for clues in our earlier proofs of Cauchy’s inequality. In 
particular, if we recall the proof that took (a — b)? > 0 as its depar- 
ture point, we might start to suspect that an analogous idea could help 
here. Is there some way to obtain a useful quadratic bound from the 
betweenness relation (5.2)? 

Once the question is put so bluntly, one does not need long to notice 
that the two-sided bound (5.2) gives us a cheap quadratic bound 


(ar - *) (ft -m) >0. (5.3) 


Although one cannot tell immediately if this observation will help, the 
analogy with the earlier success of the trivial bound (a—b)? > 0 provides 
ground for optimism. 

At a minimum, we should have the confidence needed to unwrap the 
bound (5.3) to find the equivalent inequality 


az +(mM) bp <(m+M)axb, — for all k= 1,2,...,n. (5.4) 


Now we seem to be in luck; we have found a bound on a sum of squares 
by a product, and this is precisely what a converse to Cauchy’s inequality 
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requires. The eventual role to be played by M and m is still uncertain, 
but the scent of progress is in the air. 

The bounds (5.4) call out to be summed over 1 < & < n, and, upon 
summing, the factors mM and m+ M come out neatly to give us 


S> az + (mM) S> wR < (M+ M)Y— andr, (5.5) 
k=1 k=1 k=1 
which is a fine additive bound. Thus, we face a problem of a kind we 
have met before — we need to convert an additive bound to one that is 
multiplicative. 


PASSAGE TO A PRODUCT 


If we cling to our earlier pattern, we might now be tempted to intro- 
duce normalized variables ad; and Bis but this time normalization runs 
into trouble. The problem is that the inequality (5.5) may be applied 
to G, and by only if they satisfy the ratio bound m < ar [bk < M, and 
these constraints rule out the natural candidates for the normalizations 
ay, and by. We need a new idea for passing to a product. 

Conceivably, one might get stuck here, but help is close at hand pro- 
vided that we pause to ask clearly what is needed — which is just a 
lower bound for a sum of two expressions by a product of their square 
roots. Once this is said, one can hardly fail to think of using the AM- 
GM inequality, and when it is applied to the additive bound (5.5), one 
finds 

nog nook 
(Se) (oe 
k=1 


k=1 


IA 


5{ Sa + omy rag 
k=1 k=1 
s{0m+31) Sade, 


k=1 


IA 


Now, with just a little rearranging, we come to the inequality that com- 
pletes our quest. Thus, if we set 


A=(m+M)/2 and G=vVmM, (5.6) 
then, for all nonnegative az, bp, kK = 1,2,...,n with 


0O<m<ag/by < M <x, 


we find the we have established the bound 
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thus, in the end, one sees that there is indeed a natural converse to 
Cauchy’s inequality. 


ON THE CONVERSION OF INFORMATION 


When one looks back on the proof of the converse Cauchy inequality 
(5.7), one may be struck by how quickly progress followed once the two 
order relationships, m < ax/by and ax/b,y < M, were put together to 
build the simple quadratic inequality (M — ay/bx)(ax/by — m) > 0. In 
the context of a single example, this could just be a lucky accident, but 
something deeper is afoot. 

In fact, the device of order-to-quadratic conversion is remarkably ver- 
satile tool with a wide range of applications. The next few challenge 
problems illustrate some of these that are of independent interest. 


MONOTONICITY AND CHEBYSHEV’S “ORDER INEQUALITY” 


One way to put a large collection of order relationships at your fin- 
gertips is to focus your attention on monotone sequences and monotone 
functions. This suggestion is so natural that it might not stir high hopes, 
but in fact it does lead to an important result with many applications, 
especially in probability and statistics. 

The result is due to Pafnuty Lvovich Chebyshev (1821-1894) who 
apparently had his first exposure to probability theory from our earlier 
acquaintance Victor Yacovlevich Bunyakovsky. Probability theory was 
one of those hot new mathematical topics which Bunyakovsky brought 
back to St. Petersburg when he returned from his student days studying 
with Cauchy in Paris. Another topic was the theory of complex variables 
which we will engage a bit later. 


Problem 5.2 (Chebyshev’s Order Inequality) 

Suppose that f : R — R and g: R — R are nondecreasing and 
suppose p; = 0, j = 1,2,...,n, satisfy py + po +--+ + Pn = 1. Show 
that for any nondecreasing sequence 41 < %2 < +--+ < Xp one has the 
inequality 


{stone ater} < Y- Hen)olaen (5.8) 


CONNECTIONS TO PROBABILITY AND STATISTICS 

The inequality (5.8) is easily understood without relying on its connec- 
tion to probability theory, and it has many applications in other areas of 
mathematics. Nevertheless, the probabilistic interpretation of the bound 
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(5.8) is particularly compelling. In the language of probability, it says 
that if X is a random variable for which one has P(X = x) = px for 
k=1,2,...,n then 


ELF(X)Elg( XD] S Elf) g(X)], (5.9) 


where, as usual, P stands for probability and E stands for the mathe- 
matical expectation. In other words, if random variables Y and Z may 
be written as nondecreasing functions of a single random variable X, 
then Y and Z must be nonnegatively correlated. Without Chebyshev’s 
inequality, the intuition that is commonly attached to the statistical 
notion of correlation would stand on shaky ground. 

Incidentally, there is another inequality due to Chebyshev that is even 
more important in probability theory; it tells us that for any random 
variable X with a finite mean = E(X) one has the bound 


1 
P(X ~ pl] >) < = E(X - al). (5.10) 
The proof of this bound is almost trivial, especially with the hint offered 
in Exercise 5.11, but it is such a day-to-day workhorse in probability 


theory that Chebyshev’s order (5.9) inequality is often jokingly called 
Chebyshev’s other inequality. 


A PROOF FROM OUR POCKET 


Chebyshev’s inequality (5.8) is quadratic, and the hypotheses provide 
order information, so, even if one were to meet Chebyshev’s inequality 
(5.8) in a dark alley, the order-to-quadratic conversion is likely to come 
to mind. Here the monotonicity of f and g give us the quadratic bound 


0 < { f(x) — f(@;)} {9(wn) — 9(a5)}, 


and this may be expanded in turn to give 


f(xx)g(aj) + f(e;)9 ee) < F(e;)o(az) + f(ex)g(er)- (5.11) 


From this point, we only need to bring the p;’s into the picture and 
meekly agree to take whatever arithmetic gives us. 

Thus, when we multiply the bound (5.11) by pj;p, and sum over 1 < 
j<nand1l<k<n, we find that the left-hand sum gives us 


n 


5 {Flew)olaey) + £(0s) glare) yore = 24 >» Fenn Y S a(er)or}. 
k=1 k=1 


j,k=1 
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while the right-hand sum gives us 


3 {f(xj)9(xj) + f(re)o(ae Noam = 2 LK)g (ax)on 
j,k=1 

Thus, the bound between the summands (5.11) does indeed yield the 
proof of Chebyshev’s inequality. 


ORDER, FACILITY, AND SUBTLETY 


The proof of Chebyshev’s inequality leads us to a couple of observa- 
tions. First, there are occasions when the application of the order-to- 
quadratic conversion is an automatic, straightforward affair. Even so, 
the conversion has led to some remarkable results, including the versa- 
tile rearrangement inequality which is developed in our next challenge 
problem. The rearrangement inequality is not much harder to prove 
than Chebyshev’s inequality, but some of its consequences are simply 
stunning. Here, and subsequently, we let [n] denote the set {1,2,...,n}, 
and we recall that a permutation of [n] is just a one-to-one mapping 
from [n] into [n]. 


Problem 5.3 (The Rearrangement Inequality) 
Show that for each pair of ordered real sequences 
—0 <a, < ag < +++ <an< Cc and —-wK<bhi <bo <-+-<by < Cw 


and for each permutation o : [n] > [n], one has 
S- Arbn—k4i < S- ands (k) > axbp. (5.12) 
k=1 k=1 k=1 


AUTOMATIC — BUT STILL EFFECTIVE 

This problem offers us a hypothesis that provides order relations and 
asks us for a conclusion that is quadratic. This familiar combination 
may tempt one to just to dive in, but sometimes it pays to be patient. 
After all, the statement of the rearrangement inequality is a bit involved, 
and one probably does well to first consider the simplest case n = 2. 

In this case, the order-to-quadratic conversion reminds us that 


aj<ag and bi <bg imply 0 < (a2 —a1)(be — by), 
and when this is unwrapped, we find 


aby + agby < a,b, + agbe, 
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which is precisely the rearrangement inequality (5.12) for n = 2. Nothing 
could be easier than this warm-up case; the issue now is to see if a similar 
idea can be used to deal with the more general sums 


S(o) = ys aKD5(k)- 
k=1 


INVERSIONS AND THEIR REMOVAL 
If o is not the identity permutation, then there must exist some pair 
j < k such that o(k) < o(j). Such a pair is called an inversion, and 
the observation that one draws from the case n = 2 is that if we switch 
the values of o(k) and o(j), then the value of the associated sum will 
increase — or, at least not decrease. To make this idea formal, we first 
introduce a new permutation 7 by the recipe 
o(t) ifiAjandi¥¢k 
Ti) =< o(j) ifi=k (5.13) 
o(k) ift=J9 
which is illustrated in Figure 5.1. By the definition of 7 and by factor- 
ization, we then find 


S(r) = S(c) => ajb,(j) = ands (k) — ajbo(5) — aKde (k) 


= ajb7(3) + Axbr(K) — Ag br(K) — Anbr (3) 
= (ax — 4;)(br(x) — br (jy) 2 0. 


Thus, the transformation 0 + 7 achieves two goals; first, it increases S, 
so S(o) < S(r), and second, the number of inversions of 7 is forced to 
be strictly fewer than the number of inversions of the permutation co. 


REPEATING THE PROCESS — CLOSING THE LOOP 


A permutation has at most n(n — 1)/2 inversions and only the iden- 
tity permutation has no inversions, so there exists a finite sequence of 
inversion removing transformations that move in sequence from o to the 
identity. If we denote these by o = 00, 01,.-..,0m Where Om is the iden- 
tity and m < n(n —1)/2, then, by applying the bound S(a;_1) < S(a;) 
for 7 =1,2,...,m, we find 


S(o) < S- Apdg.- 
k=1 


This completes the proof of the upper half of the rearrangement inequal- 
ity (5.12). 
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a1 a2 aj Qk An-1 Qn 
boii) boiz) bo(3) ree becky +++ ba(n—1)— Ba(n) 
——_ 
bray bray ts Bey) Hea becky tt Oe(m=1) — Or(n) 


Fig. 5.1. An interchange operation converts the permutation o to a permu- 
tation 7. By design, the new permutation 7 has fewer inversions than o; by 
calculation, one also finds that S(a) < S(r). 


The easy way to get the lower half is then to notice that it is an 
immediate consequence of the upper half. Thus, if we consider 6, = 


—bn, b5 = —by_1,..., 1, = —b1 we see that 


U ! ! 
by < bp < +++ <b, 


and, by the upper half of the rearrangement inequality (5.12) applied to 
the sequence bj), b$,...,b/, we get the lower half of the inequality (5.12) 
for the sequence 01, b2,..., Dn. 


LOOKING BAck — TESTING NEW PROBES 


The statement of the rearrangement inequality is exceptionally natu- 
ral, and it does not provide us with any obvious loose ends. We might 
look back on it many times and never think of any useful variations 
of either its statement or its proof. Nevertheless, such variations can 
always be found; one just needs to use the right probes. 

Obviously, no single probe, or even any set of probes, can lead with 
certainty to a useful variation of a given result, but there are a few 
generic questions that are almost always worth our time. One of the 
best of these asks: “Is there a nonlinear version of this result?” 

Here, to make sense of this question, we first need to notice that the 
rearrangement inequality is a statement about sums of linear functions 
of the ordered n-tuples 


{on—k+ifi<e<ns {bony ti<e<n and {be}icr<n, 
where the “linear functions” are simply the n mappings given by 
LH apr k=1,2,...,n. 


Such simple linear maps are usually not worth naming, but here we have 
a higher purpose in mind. In particular, with this identification behind 
us, we may not need long to think of some ways that the monotonicity 
condition ax < ax41 might be re-expressed. 
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Several variations of the rearrangement inequality may come to mind, 
and our next challenge problem explores one of the simplest of these. 
It was first studied by A. Vince, and it has several informative conse- 
quences. 


Problem 5.4 (A Nonlinear Rearrangement Inequality) 
Let fi, fo, ..., fn be functions from the interval I into R such that 


feai(x) — fr(x) is nondecreasing for alll <k <n. (5.14) 


Let by < bg < +--+ < by be an ordered sequence of elements of I, and 
show that for each permutation o : [n] — [n], one has the bound 


yo he n—k+1 ya Al bo(k)) < SO fa(be). (5.15) 
k=1 


TESTING THE WATERS 


This problem is intended to generalize the rearrangement inequality, 
and we see immediately that it does when we identify f,(a2) with the 
map x ++ axx. To be sure, there are far more interesting nonlinear 
examples which one can find after even a little experimentation. 

For instance, one might take aj < ag <--- < a» and consider the 
functions «+ log(a, + x). Here one finds 


log(ag41 + x) — log(ax + x) = log ( 


and if we set r(x) = (an41 + x)/(ax + x), then direct calculation gives 


! 2 Qk — Ak+1 
r’( )= (a, + x)? Ne 
so, if we take 

fe (a) = —log(ax + 2) fork =1,2,...,n 


then condition (5.14) is satisfied. Thus, by Vince’s inequality and expo- 
nentiation one finds that for each permutation o : [n] — [n] that 


[][ (ee + 2x) < [ [Ge + b04)) < [[ (ee + On—nst). (5.16) 
k=1 Re] k=1 


This interesting product bound (5.16) shows that there is power in 
Vince’s inequality, though in this particular case the bound was known 
earlier. Still, we see that a proof of Vince’s inequality will be worth our 
time — even if only because of the corollary (5.16). 
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RECYCLING AN ALGORITHMIC PROOF 


If we generalize our earlier sums and write 


n 


S(o) = S> fe (bony): 


k=1 
then we already know from the definition (5.13) and discussion of the 
inversion decreasing transformation 0 ++ 7 that we only need to show 
S(a) < S(r). 


Now, almost as before, we calculate the difference 


S(r) — S(o) = f5(b5(3)) + fe(brcey) — Fi(bo(3)) — Fr(boce)) 
= f(r Gy) + felOr (Ky) — f5 (Orc) — FalOrgy) 
= {fe(Orcey) — fi (Orca) t — LfelOr(gy) — Fiori) } 2 9, 
and this time the last inequality comes from b,j) < 6;(,) and our hy- 
pothesis that f,(2) — f;(x) is a nondecreasing function of x € J. From 


this relation, one then sees that no further change is needed in our earlier 
arguments, and the proof of the nonlinear version of the rearrangement 


inequality is complete. 


EXERCISES 


Exercise 5.1 (Baseball and Cauchy’s Third Inequality) 

In the remarkable Note IT of 1821 where Cauchy proved both his 
namesake inequality and the fundamental AM-GM bound, one finds a 
third inequality which is not as notable nor as deep but which is still 
handy from time to time. The inequality asserts that for any positive 


real numbers hj, ho,...,hn and 61, b2,...,b7 one has the ratio bounds 
. Ay Ap tho t-:- thn hj 
= < < -J — MV. ‘ 
7 15<n b; ~ by +bo4+---+bn — ise b; (5 17) 


Sports enthusiasts may imagine, as Cauchy never would, that b; denotes 
the number of times a baseball player j goes to bat, and h; denotes the 
number of times he gets a hit. The inequality confirms the intuitive fact 
that the batting average of a team is never worse than that of its worst 
hitter and never better than that of its best hitter. 

Prove the inequality (5.17) and put it to honest mathematical use by 
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proving that for any polynomial P(x) = co + crx + cov? + -++ + ena” 
with positive coefficients one has the monotonicity relation 


scosr > (9) <fBs 


Exercise 5.2 (Betweenness and an Inductive Proof of AM-GM) 

One can build an inductive proof of the basic AM-GM inequality 
(2.3) by exploiting the conversion of an order relation to a quadratic 
bound. To get started, first consider 0 < ay < ag < -:: < Gy, set 
A= (a, +a2+-::+4,)/n, and then show that one has 


a10n/A <a, + ap — A. 


Now, complete the induction step of the AM-GM proof by considering 
the n — 1 element set S = {a2,a3,...,@n—1} U {a1 +a, — A}. 


Exercise 5.3 (Cauchy—Schwarz and the Cross-Term Defect) 
If wu and v are elements of the real inner product space V for which 
on has the upper bounds 


(u,u) < A* and (v,v) < B’, 


then Cauchy’s inequality tells us (u,v) < AB. Show that one then also 
has a lower bound on the cross-term difference AB — (u, v), namely, 


1 1 


{4?~ (uu) bf — (ww < AB (uy), (5.18) 


Exercise 5.4 (A Remarkable Inequality of I. Schur) 
Show that for all values of x,y, z > 0, one has for all a > 0 that 


e*(a —y)(@— 2) +y%(y— a)(y— 2) + 2%(z2—2)(a—y) 20. (5.19) 


Moreover, show that one has equality here if and only if one has either 
x =y==2x or two of the variables are equal and the third is zero. 

Schur’s inequality can sometimes saves the day in problems where the 
AM-GM inequality looks like the natural tool, yet it comes up short. 
Sometimes the two-pronged condition for equality also provides a clue 
that Schur’s inequality may be of help. 


Exercise 5.5 (The Pélya—Szegé Converse Restructured) 
The converse Cauchy inequality (5.7) is expressed with the aid of 
bounds on the ratios a;,/b,, but for many applications it is useful to know 
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that one also has a natural converse under the more straightforward 
hypothesis that 


O<a<a,<A and 0<b0<)<B for all k =1,2,...,n 
Use the Cauchy converse (5.7) to prove that in this case one has 


{raya} /{ dram} < 1/# Vay. 


k=1 k=1 


Exercise 5.6 (A Competition Perennial) 
Show that if a > 0, b > 0, and c > O then one has the elegant 
symmetric bound 


3 a b Cc 


< : 2 
3 bbe? wee Gee (5.20) 


This is known as Nesbitt’s inequality, and along with several natural 
variations, it has served a remarkable number of mathematical compe- 
titions, from Moscow in 1962 to the Canadian Maritimes in 2002. 


Exercise 5.7 (Rearrangement, Cyclic Shifts, and the AM-GM) 

Skillful use of the rearrangement inequality often calls for one to ex- 
ploit symmetry and to look for clever specializations of the resulting 
bounds. This problem outlines a proof of the AM-GM inequality that 
nicely illustrates these steps. 


(a) Show that for positive cy, k = 1,2,...,n one has 
ps eR og Se Cn : 
Cn Cl c2 Cn—-1 
(b) Specialize the result of part (a) to show that for all positive xz, 
k =1,2,...,n, one has the rational bound 
a 
ns +g +%3++:++ 2p. 
L122 eee rn 


(c) Specialize a third time to show that for p > 0 one also has 


pry 
ns + pg + px3 + +++ + pn, 
PPL L2 +++ Ly, 
and finally indicate how the right choice of p now yields the AM-GM 


inequality (2.3). 
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y=ut+a' 


The mapf(«) =a+a7! 
decreases on (0,1] and 
increases on [1,0o) 


M=m"! 


Fig. 5.2. One key to the proof of Kantorovich’s inequality is the geometry 
of the map x > «+ 27'; another key is that a multiplicative inequality is 
sometimes proved most easily by first establishing an appropriate additive 
inequality. To say much more would risk giving away the game. 


Exercise 5.8 (Kantorovich’s Inequality for Reciprocals) 
Show that if 0 < m= a < 2 <-:: < @, = M < o then for 
nonnegative weights with pj + po +---+ pn =1 one has 


{Leel{Sas}s& (5.21) 


where pp = (m+ M)/2 and y = VmM. This bound provides a natural 
complement to the elementary inequality of Exercise 1.2 (page 12), but it 
also has important applications in numerical analysis where, for example, 
it has been used to estimated the rate of convergence of the method of 
steepest ascent. To get started with the proof, one might note that by 
homogeneity it suffices to consider the case when y = 1; the geometry 
of Figure 5.2 then tells a powerful tale. 


Exercise 5.9 (Monotonicity Method) 


Suppose ax, > 0 and by > 0 for k = 1,2,...,n and for fixed 0c R 
consider the function 


fo(x) = { Se afteot—a |! oe au), ceER, 
j=l j=l 


If we set 6 = 1, we see that f;(0)1/? gives us the left side of Cauchy’s 
inequality while f,(1)!/? gives us the right side. Show that fo(x) is a 
monotone increasing of x on [0,1], a fact which gives us a parametric 
family of inequalities containing Cauchy’s inequality as a special case. 
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Exercise 5.10 (A Proto-Muirhead Inequality) 
If the nonnegative real numbers ay, a2, 1, and bz satisfy 


max{ay1,a2} > max{bi,b2} and a,+ag=b1 + bo, 
then for nonnegative x and y, one has 
Ot ye + 02 yt <a y®? + ay, (5.22) 


Prove this assertion by considering an appropriate factorization of the 
difference of the two sides. 


Exercise 5.11 (Chebyshev’s Inequality for Tail Probabilities) 

One of the most basic properties of the mathematical expectation E(-) 
that one meets in probability theory is that for any random variables 
X and Y with finite expectations the relationship X < Y implies that 
E(X) < E(Y). Use this fact to show that for any random variable Z 
with finite mean = E(Z) one has the inequality 


P(\Z—-pl>A)< B(\Z — pI’). (5.23) 


This bound provides one concrete expression of the notion that a random 
variable is not likely to be too far away from its mean, and it is surely 
the most used of the several inequalities that carry Chebyshev’s name. 


6 
Convexity — The Third Pillar 


There are three great pillars of the theory of inequalities: positivity, 
monotonicity, and convexity. The notions of positivity and monotonicity 
are so intrinsic to the subject that they serve us steadily without ever 
calling attention to themselves, but convexity is different. Convexity 
expresses a second order effect, and for it to provide assistance we almost 
always need to make some deliberate preparations. 

To begin, we first recall that a function f : [a,b] — R is said to be 
convex provided that for all 2, y € [a,b] and all 0 < p< 1 one has 


f(pe+(1—p)y) < pf(x) + (1p) f(y). (6.1) 


With nothing more than this definition and the intuition offered by the 
first frame of Figure 6.1, we can set a challenge problem which creates 
a fundamental link between the notion of convexity and the theory of 
inequalities. 


Problem 6.1 (Jensen’s Inequality) 
Suppose that f : [a,b] > R is a convex function and suppose that the 
nonnegative real numbers p;, 7 = 1,2,...,n satisfy 


Pit pe2t-+++ pr =. 
Show that for all x; € [a,b], 7 =1,2,...,n one has 


(Som) < Yovsste). (6.2) 


When n = 2 we see that Jensen’s inequality (6.2) is nothing more than 
the definition of convexity, so our instincts may suggest that we look for 
a proof by induction. Such an approach calls for one to relate averages 
of size n— 1 to averages of size n, and this can be achieved several ways. 
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&=pr1+(1—p)xe2 


Three faces of 
convexity 


y = f(xo) + (% — x0) 


vy v2 x3 xo 


Fig. 6.1. By definition, a function f is convex provided that it satisfies the 
condition (6.1) which is illustrated in frame (A), but a convex function may 
be characterized in several other ways. For example, frame (B) illustrates that 
a function is convex if and only if its sequential secants have increasing slopes, 
and frame (C) illustrates that a function is convex if and only if for each point 
p on its graph there is line through p that lies below the graph. None of these 
criteria requires that f be differentiable. 


One natural idea is simply to pull out the last summand and to renor- 
malize the sum that is left behind. More precisely, we first note that 
there is no loss of generality if we assume p, > 0 and, in this case, we 


can write 
n n—-1 D; 
Do Pit; = Prtn + (1— Pn) DOG tg, 
j=l jo i 


Now, from this representation, the definition of convexity, and the in- 
duction hypothesis — all applied in that order — we see that 


wy n-1 
§( opie) <i Cea) alt (pa) $( Tan v5) 
j=1 in 


j=l 
n—-1 
< Pof(tn) + (1 Pa) 0) pe Fe) 
j=l ot 


= Dif (a3). 
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This bound completes the induction step and thus completes the solution 
to one of the easiest — but most useful — of all our challenge problems. 


THE CASE OF EQUALITY 


We will find many applications of Jensen’s inequality, and some of 
the most engaging of these will depend on understanding the conditions 
where one has equality. Here it is useful to restrict attention to those 
functions f : [a,b] > R such that for all x,y € [a,b] and allO <p<1 
and x # y one has the strict inequality 


f (px + (1 — p)y) < pf(x) + (1 — p) f(y). (6.3) 


Such functions are said to be strictly convex, and they help us frame the 


next challenge problem. 


Problem 6.2 (The Case of Equality in Jensen’s Inequality) 
Suppose that f : [a,b] > R is strictly convex and show that if 


(Sons) Ynse (6.4) 


where the positive reals pj, 7 =1,2,...,n have sum py+p2+-::+pn = 1, 
then one must have 


=o = +++ = 2p. (6.5) 


Once more, our task is easy, but, as with Jensen’s inequality, the 
importance of the result justifies its role as a challenge problem. For 
many inequalities one discovers when equality can hold by taking the 
proof of the inequality and running it backwards. This approach works 
perfectly well with Jensen’s inequality, but logic of the argument still 
deserves some attention. 

First, if the conclusion (6.5) does not hold, then the set 

i {j LiF max, 24} 
is a proper subset of {1,2,...,n}, and we will argue that this leads one 
to a contradiction. To see why this is so, we first set 


90 Convexity — The Third Pillar 


from which we note that the strict convexity of f implies 
F( Spies) = see += ply) < phe) +L=P)Fu). (6) 
j=l 


Moreover, by the plain vanilla convexity of f applied separately at « and 
y, we also have the inequality 


Pj D3 . 
pf(x)+(1—-p) f(y) < p>, = f(«;)+(-p) 5 f=) = once 
een. : p = 
jes JES j=1 
Finally, from this bound and the strict inequality (6.6), we find 


#( Sopsas) <>) pf (as); 
j=l j=l 


and since this inequality contradicts the assumption (6.4), the solution 
of the challenge problem is complete. 


THE DIFFERENTIAL CRITERION FOR CONVEXITY 


A key benefit of Jensen’s inequality is its generality, but before Jensen’s 
inequality can be put to work in a concrete problem, one needs to es- 
tablish the convexity of the relevant function. On some occasions this 
can be achieved by direct application of the definition (6.1), but more 
commonly, convexity is established by applying the differential criterion 
provided by the next challenge problem. 


Problem 6.3 (Differential Criterion for Convexity) 
Show that if f : (a,b) > R is twice differentiable, then 


f" (a) > 0 for all x € (a,b) implies f(-) is convex on (a,b), 
and, in parallel, show that 
f(x) > 0 for all x € (a,b) implies f(-) is strictly convex on (a,b). 


If one simply visualizes the meaning of the condition f”(x) > 0, then 
this problem may seem rather obvious. Nevertheless, if one wants a 
complete proof, rather than an intuitive sketch, then the problem is not 
as straightforward as the graphs of Figure 6.1 might suggest. 

Here, since we need to relate the function f to its derivatives, it is 
perhaps most natural to begin with the representation of f provided 
by the fundamental theorem of calculus. Specifically, if we fix a value 
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Xo € [a,b], then we have the representation 


f(x) = f(ao) + iB f'(u)du for all x € [a, 8, (6.7) 


and once this formula is written down, we may not need long to think 
of exploiting the hypothesis f”(-) > 0 by noting that it implies that 
the integrand f’(-) is nondecreasing. In fact, our hypothesis contains 
no further information, so the representation (6.7), the monotonicity of 
f’(-), and honest arithmetic must carry us the rest of the way. 

To forge ahead, we take a < x < y < band 0 < p < 1 and we 
also set g = 1 — p, so by applying the representation (6.7) to 2, y, and 
Lo = px + qy we see A = pf (x) +qf(y) — f(pe+ qy) may be written as 


ae ii 9 Bison, / aie (6.8) 


x+qy x 


For u € [x, px + qy] one has f’(u) < f’(pa + qy), so we have the bound 


px+qy 
»|f f'(u) du < aply — x) f' (px + ay), (6.9) 


while for u € [px + qy,y] one has f’(u) > f’(px + qy), so we have the 
matching bound 


y 

a| f'(u) du > qply — &) f"(px + ay). (6.10) 
pe+qy 

Therefore, from the integral representation (6.8) for A and the two 

monotonicity estimates (6.9) and (6.10), we find A > 0, just as we 

needed to complete the solution of the first half of the problem. 

For the second half of the theorem, we only need to note that if 
f" (x) > 0 for all x € (a,b), then both of the inequalities (6.9) and 
(6.10) are strict. Thus, the representation (6.8) for A gives us A > 0, 
and we have the strict convexity of f. 

Before leaving this challenge problem, we should note that there is an 
alternative way to proceed that is also quite instructive. In particular, 
one can rely on Rolle’s theorem to help estimate A by comparison to an 
appropriate polynomial; this solution is outlined in Exercise 6.10. 


THE AM-GM INEQUALITY AND THE SPECIAL NATURE OF x +> e” 


The derivative criterion tells us that the map x +> e” is convex, so 
Jensen’s inequality tells us that for all real y,, y2,..., Yn and all positive 
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pj, J =1,2,...,n with p) + po+---+Dpn =1, one has 


n n 
exp (Sor) < S pje%. 
j=l j=l 


Now, when we set x; = e%, we then find the familiar relation 


n 


n 
IT27 sd pias. 
j=l j 


1 


Thus, with lightning speed and crystal clear logic, Jensen’s inequality 
leads one to the general AM-GM bound. 

Finally, this view of the AM-GM inequality as a special instance of 
Jensen’s inequality for the function x +> e* puts the AM-GM inequal- 
ity in a unique light — one that may reveal the ultimate source of 
its vitality. Quite possibly, the pervasive value of the AM-GM bound 
throughout the theory of inequalities is simply one more reflection of the 
fundamental role of the exponential function as an isomorphism between 
two most important groups in mathematics: addition on the real line 
and multiplication on the positive real line. 


How TO USE CONVEXITY IN A TYPICAL PROBLEM 


Many of the familiar functions of trigonometry and geometry have 
easily established convexity properties, and, more often than not, this 
convexity has useful consequences. The next challenge problem comes 
with no hint of convexity in its statement, but, if one is sensitive to the 
way Jensen’s inequality helps us understand averages, then the required 
convexity is not hard to find. 


Problem 6.4 (On the Maximum of the Product of Two Edges) 


In an equilateral triangle with area A, the product of any two sides 
is equal to (4/\/3)A. Show that this represents the extreme case in the 
sense that for a triangle with area A there must exist two sides the lengths 
of which have a product that is at least as large as (4/\/3)A. 


To get started we need formulas which relate edge lengths to areas, 
and, in the traditional notation of Figure 6.2, there are three equally 
viable formulas: 


1 1 
A= 5 tbsin = zocsin 8 = 5 besin a. 
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The area A of the generic triangle 
b has three basic representations: 
A= sabsiny = Sacsin Z = $bcsina 
c 
Fig. 6.2. All of the trigonometric functions are convex (or concave) if their 


arguments are restricted to an appropriate domain, and, as a consequence, 
there are many interesting geometric consequences of Jensen’s inequality. 


Now, if we average these representations, then we find that 


1 1 1 1 1 

—(ab be) = (2A 6.11 

3( si a santana tas om) 
and this is a formula that almost begs us to ask about the convexity of 
1/sinz. The plot of x + 1/sina for x € (0,7) certainly looks convex, 
and our suspicions can be confirmed by calculating the second derivative, 


ar tl 1 cos? & 
( ) = 2 0 for all x € (0,7). (6.12) 


sing) sing sin” x 
Therefore, since we have (a + 8+ )/3 = 1/3, we find from Jensen’s 
inequality that 


Hfoilie go Slt gs ee Oe 
3lsina sinG sinyf ~ sinz/3 3’ 


so, by inequality (6.11), we do obtain the conjectured bound 


(ab+ ac + be) > A (6.13) 


max(ab, ac, bc) > 


wl rR 


CONNECTIONS AND REFINEMENTS 


This challenge problem is closely related to a well-known inequality 
of Weitzenbéck which asserts that in any triangle one has 


e+eP+ce> =A. (6.14) 


Ai 


In fact, to pass from the bound (6.13) to Weitzenbéck’s inequality one 
only has to recall that 


ab+ac+ be < a? +b? 4+ c?, 


which is a familiar fact that one can obtain in at least three ways — 
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Cauchy’s inequality, the AM-GM bound, or the rearrangement inequal- 
ity will all do the trick with equal grace. 

Weitzenboéck’s inequality turns out to have many instructive proofs — 
Engel (1998) gives eleven! It also has several informative refinements, 
one of which is developed in Exercise 6.9 with help from the convexity 
of the map z+ tan on [0, 7/2]. 


How TO Do BETTER MUCH OF THE TIME 


There are some mathematical methods which one might call generic 
improvers; broadly speaking, these are methods that can be used in 
a semi-automatic way to generalize an identity, refine an inequality, or 
otherwise improve a given result. A classic example which we saw earlier 
is the polarization device (see page 49) which often enables one to convert 
an identity for squares into a more general identity for products. 

The next challenge problem provides an example of a different sort. It 
suggests how one might think about sharpening almost any result that 
is obtained via Jensen’s inequality. 


Problem 6.5 (Hélder’s Defect Formula) 
If f : [a,b] > R ts twice differentiable and if we have the bounds 


O0<m<f"(24)<M for all x € [a,b], (6.15) 


then for any real values a < 41 <q < +++ < ay, < b and any nonnegative 


reals pr, k =1,2,...,n with pi + po ++-:+ pn = 1, there exists a real 
value 4 € [m, M] for which one has the formula 


dns Ly) -#(Somes) = FH pial ;— ap). (6.16) 


j=l k=1 


CONTEXT AND A PLAN 


This result is from the same famous 1885 paper of Otto Ludwig Holder 
(1859-1937) in which one finds his proof of the inequality that has 
come to be know universally as “Holder’s inequality.” The defect for- 
mula (6.16) is much less well known, but it is nevertheless valuable. It 
provides a perfectly natural measure of the difference between the two 
sides of Jensen’s inequality, and it tells us how to beat the plain vanilla 
version of Jensen’s inequality whenever we can check the additional hy- 
pothesis (6.15). More often than not, the extra precision does not justify 
the added complexity, but it is a safe bet that some good problems are 
waiting to be cracked with just this refinement. 
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Holder’s defect formula (6.16) also deepens one’s understanding of 
the relationship of convex functions to the simpler affine or quadratic 
functions. For example, if the difference IM — m is small, the bound 
(6.16) tells us that f behaves rather like a quadratic function on [{a, 0]. 
Moreover, in the extreme case when m = M, one finds that f is exactly 
quadratic, say f(z) = a+ Bx +x? with m = M = p = 24, and the 
defect formula (6.16) reduces to a simple quadratic identity. 

Similarly, if M is small, say 0 < M < e, then the bound (6.16) 
tells us that f behaves rather like an affine function f(x) = a + Bz. 
For an exactly affine function, the left-hand side of the bound (6.16) 
is identically equal to zero, but in general the bound (6.16) asserts a 
more subtle relation. More precisely, it tells us that the left-hand side 
is a small multiple of a measure of the extent to which the values z,, 
j =1,2,...,n are diffused throughout the interval [a, 6]. 


CONSIDERATION OF THE CONDITION 


This challenge problem leads us quite naturally to an intermediate 
question: How can we use the fact that 0 <m < f”(a) < M? Once this 
question is asked, one may not need long to observe that the two closely 
related functions 

1 


g(x) = Mx —f(#) and h(a) = f(x) - sma 


are again convex. In turn, this observation almost begs us to ask what 
Jensen’s inequality says for these functions. 
For g(x), Jensen’s inequality gives us the bound 


sia? — f(a) < Yonef Saad — s2x)} 
k=1 


where we have set & = p,@1 + poto+-+-++DPnXn, and this bound is easily 
rearranged to yield 


{domston) bs )<jM {(Somst)-# “| = MD pala)? 


The perfectly analogous computation for h(a) gives us a lower bound 


n n 


{ Dosean)} =f ee 5m Spel =a) 


k=1 k=1 


and these upper and lower bounds almost complete the proof of the 
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assertion (6.16). The only missing element is the identity 


n 


So pale — 2)? = 52> papel; — 2%)? 


k=1 j=lk=l 
which is easily checked by algebraic expansion and the definition of Z. 


PREVAILING AFTER A NEAR FAILURE 


Convexity and Jensen’s inequality provide straightforward solutions to 
many problems. Nevertheless, they will sometimes run into a unexpected 
roadblock. Our next challenge comes from the famous problem section of 
the American Mathematical Monthly, and it provides a classic example 
of this phenomenon. 

At first the problem looks invitingly easy, but, soon enough, it presents 
difficulties. Fortunately, these turn out to be of a generous kind. After 
we deepen our understanding of convex functions, we find that Jensen’s 
inequality does indeed prevail. 


Problem 6.6 (AMM 2002, Proposed by M. Mazur) 


Show that if a,b, and c, are positive real numbers for which one has 
the lower bound abc > 2°, then 


1 <3{ 1 1 ; 1 \ (6.17) 
1+ (abe)/3 ~ 3\ Vita Vib ViFtcS’ 


The average on the right-hand side suggests that Jensen’s inequal- 
ity might prove useful, while the geometric mean on the left-hand side 
suggests that the exponential function will have a role. With more ex- 
ploration — and some luck — one may not need long to guess that the 
function 

1 


Vil+e" 


might help bring Jensen’s inequality properly into play. In fact, once 


f(x) = 


this function is written down, one may check almost without calculation 
that the proposed inequality (6.17) is equivalent to the assertion that 


p( ERE?) < He ste) + FW) +1} (6.18) 


for all real x, y, and z such that exp(x + y+ z) > 2°. 
To see if Jensen’s inequality may be applied, we need to assess the 
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The new convex function g 
agrees with f on [3 log 2, 00), 
and it agrees with the tangent 
of f on [0,3 log 2]. 


SI 


log 2 3 log 2 


Fig. 6.3. Effective use of Jensen’s inequality calls for one to find a function 
that is convex on all of [0,00) and that is never larger than f. (Note: To make 
the concavity of f on [0,log 2) visible, the graph is not drawn to scale.) 


convexity properties of f, so we just differentiate twice to find 


f(a) = - 


eo 


2(1 + e*)3/2 
and 


1 
f"(2) = Fag vs pee Sater at (1 igh) Per 


The second formula tells us that f”(x) > 0 if and only if we have e” > 2, 
so by Jensen’s inequality one finds that the target inequality (6.17) holds 
provided that each of the terms a, b, and c is at least as large as 2. 


DIFFICULTIES, EXPLORATIONS, AND POSSIBILITIES 


The difficulty we face here is that the hypothesis of Problem 6.6 only 
tells us that product abc is at least as large as 29; we are not given any 
bounds on the individual terms except that a > 0, b > 0, and c > 0. 
Thus, Jensen’s inequality cannot complete the proof all by itself, and we 
must seek help from some other resources. 

There are many ideas one might try, but before going too far, one 
should surely consider the graph of f(x). What one finds from the plot 
in Figure 6.3 is that the f(a) looks remarkably convex over the interval 
[0,10] despite the fact that calculation that shows f(x) is concave on 
(0, log 2] and convex on [log 2,00). Thus, our plot holds out new hope; 
perhaps some small modification of f might have the convexity that we 
need to solve our problem. 
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THE IDEA OF A CONVEX MINORANT 


When we think about the way we hoped to use f with Jensen’s in- 
equality, we soon realize that we can make our task a little bit easier. 
Suppose, for example, that we can find a convex function g : [0,co) — R 
such that we have both the condition 


g(x) < f(z) for all x € [0, co) (6.19) 
and the complementary condition 
g(x) = f(z) for all x > 3log2. (6.20) 


For such a function, Jensen’s inequality would tell us that for x,y, and 
z with exp(x + y + z) > 2° we have the bound 


(=) (A) 
< 5 { (2) +910) +0(2)} 


< F{f@ +20) +H}. 


x 


The first and last terms of this bound recover the inequality (6.18) so the 
solution of the challenge problem would be complete except for one small 
detail — we still need to show that there is a convex g on [0,00) such 
that g(x) < f(x) for x € [0,3log 2] and f(a) = g(a) for all x > 3log2. 


CONSTRUCTION OF THE CONVEX MINORANT 


One way to construct a convex function g with the minorization prop- 
erties describe above is to just take g(x) = f(x) for x > 3log 2 and to de- 
fine g(x) on [0,3 log 2] by linear extrapolation. Thus, for x € [0,3 log 2], 
we take 


g(x) = f(3log 2) + (x — 3 log 2) f’(3 log 2) 
= ; + (3log 2 — 2)(4/27). 


Three simple observations now suffice to show that g(x) < f(x) for 
alla > 0. First, for x > 3log2, we have g(x) = f(x) by definition. 
Second, for log2 < x < 3log2 we have g(x) < f(a) because in this 
range g(2) has the value of a tangent line to f(a) and by convexity of f 
on log2 < x < 3log2 the tangent line is below f. Third, in the critical 
region 0 < x < log2 we have g(x) < f(a) because (i) f is concave, (ii) 


Convexity — The Third Pillar 99 


g is linear, and (iii) f is larger than g at the end points of the interval 
(0, log 2]. More precisely, at the first end point one has 


{O20 Hoy Se = O07 


while at the second end point one has 


1 
g(log 2) = 0.538--- < f(log2) = Ya = 0.577.... 


Thus, the convex function g is indeed a minorant of f which agrees with 
f on [3 log 2, co), so the solution to the challenge problem is complete. 


JENSEN’S INEQUALITY IN PERSPECTIVE 


Jensen’s inequality may lack the primordial nature of either Cauchy’s 
inequality or the AM-GM inequality, but, if one were forced to pick a 
single result on which to build a theory of mathematical inequalities, 
Jensen’s inequality would be an excellent choice. It can be used as a 
starting point for the proofs of almost all of the results we have seen so 
far, and, even then, it is far from exhausted. 


EXERCISES 


Exercise 6.1 (A Renaissance Inequality) 
The Renaissance mathematician Pietro Mengoli (1625-1686) only needed 
simple algebra to prove the pleasing symmetric inequality 
1 1 1 3 
+ —4 > for all x > 1, (6.21) 
Fe I OR el 


yet he achieved a modest claim on intellectual immortality when he used 
it to give one of the earliest proofs of the divergence of the harmonic 
series, 

Hyaltztgtete = jim, Hn = 00. (6.22) 
Rediscover Mengoli’s algebraic proof of the inequality (6.21) and check 
that it also follows from Jensen’s inequality. Further, show, as Mengoli 
did, that the inequality (6.21) implies the divergence of H,,. 


Exercise 6.2 (A Perfect Cube and a Triple Product) 
Show that if z,y,z >O0 and «+y-+z=1 then one has 


ats (4 )0+)(43) 
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An inscribed polygon 

can be decomposed into 
triangles like the shaded one 
which has area sin 0. 


Fig. 6.4. If a convex polygon with n sides is inscribed the unit circle, our visual 
imagination suggests that the area is maximized only by a regular polygon. 
This conjecture can be proved by methods which would have been familiar to 
Euclid, but a modern proof by convexity is easier. 


Exercise 6.3 (Area Inequality for n-gons) 


Figure 6.4 suggests that among all convex n-sided convex polygons 
that one can inscribed in a circle, only the regular n-gon has maximal 
area. Can Jensen’s inequality be used to confirm this suggestion? 


Exercise 6.4 (Investment Inequalities) 


If 0 < rx < oo, and if our investment of one dollar in year k grows 
to 1+ rz, dollars at the end of the year, we call rz, the return on our 
investment in year k. Show that the value V = (1+7r1)(1+r2)--:(1+rn) 
of our investments after n years must satisfy the bounds 


(l+re)” =I (1+rp) <(1+ra)”, (6.23) 


where rg = (rira:::tn)/" and ra = (ri tro +--+ +1p)/n. Also 
explain why this bound might be viewed as a refinement of the AM-GM 
inequality. 


Exercise 6.5 (Superadditivity of the Geometric Means) 


We have seen before in Exercise 2.11 that for nonnegative a; and b,, 
j =1,2,...,n one has superadditivity of the geometric mean: 


(ands can Ei che) St Gy bdo + baad be 


Does this also follow from Jensen’s inequality? 
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Exercise 6.6 (Cauchy’s Technique and Jensen’s Inequality) 

In 1906, J.L.W.V. Jensen wrote an article that was inspired by the 
proof given by Cauchy’s for the AM-GM inequality, and, in an effort to 
get to the heart of Cauchy’s argument, Jensen introduced the class of 
functions that satisfy the inequality 


s(#54) < eee 


for all x, y € [a, 6]. (6.24) 


Such functions are now called J-convex functions, and, as we note below 
in Exercise 6.7, they are just slightly more general than the convex 
functions defined by condition (6.1). 

For a moment, step into Jensen’s shoes and show how one can modify 
Cauchy’s leap-forward fall-back induction (page 20) to prove that for all 
J-convex functions one has 


1 1 
s( m) < f(r for all {a,:1<k<n}C [a,b]. (6.25 
= tt) $5 fea) { FC [ad]. (6.25) 
Here one might note that near the end of his 1906 article, Jensen ex- 
pressed the bold view that perhaps someday the class of convex function 
might seen to be as fundamental as the class of positive functions or the 
class of increasing functions. If one allows for the mild shift from the 
specific notion of J-convexity to the more modern interpretation of con- 
vexity (6.1), then Jensen’s view turned out to be quite prescient. 


Exercise 6.7 (Convexity and J-Convexity) 

Show that if f : [a,b] — R is continuous and J-convex, then f must 
be convex in the modern sense expressed by the condition (6.1). Asa 
curiosity, we should note that there do exist J-convex functions that are 
not convex in the modern sense. Nevertheless, such functions are wildly 
discontinuous, and they are quite unlikely to turn up unless they are 
explicitly invited. 


Exercise 6.8 (A “One-liner” That Could Have Taken All Day) 
Show that for all 0 < x,y,z <1, one has the bound 


gee ae 2(y? — 1)(22 -1) <2 
— | x —_ eso . 
l+y l+z l1+a+y B = 


L(x, y, Z) 


Placed suggestively in a chapter on convexity, this problem is not much 
more than a one-liner, but in a less informative location, it might send 
one down a long trail of fruitless algebra. 
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Exercise 6.9 (Hadwiger—Finsler Inequality) 

For any triangle with the traditional labelling of Figure 6.2, the law of 
cosines tells us that a? = b? + c? — 2becos a. Show that this law implies 
the area formula 


a® = (b—c)* + 4Atan(a/2), 
then show how Jensen’s inequality implies that in any triangle one has 
a+b? +c? > (a—b)? + (b— 0)? + (c— a)? +4V3A. 


This bound is known as the Hadwiger—Finsler inequality, and it provides 
one of the nicest refinements of Weitzenbock’s inequality. 


Exercise 6.10 (The f” Criterion and Rolle’s Theorem) 

We saw earlier (page 90) that the fundamental theorem of calculus 
implies that if one has f’(x) > 0 for all x € [a,}], then f is convex on 
(a, b]. This exercise sketches how one can also prove this important fact 
by estimating the difference f(px1+qx%2)—pf(#1)—¢f (#2) by comparison 
with an appropriate polynomial. 

(a) Take 0 <p<1,q=1-—pand set pp = pax, + qve where x1 < Xp. 
Find the unique quadratic polynomial Q(x) such that 


Q(a1) = f(t1), Q(@2) = f(@2), and Q(u) = f(H)- 


(b) Use the fact that A(x) = f(x) — Q(x) has three distinct zeros in 
[a, b] to show that there is an x* such that A” (a2*) = 0. 

(c) Finally, explain how f”(a) > 0 for all x € [a,b] and A” (a*) = 0 
imply that f(px1 + x2) — pf (x1) — qf(2) 2 9. 


Exercise 6.11 (Transformation to Achieve Convexity) 
Show that for positive a, b, and c such that a+ 6+ c= abc one has 
1 ae 1 2 1 z 3 

Vl¢@ vV14+ Vite27 2 
This problem from the 1998 Korean National Olympiad is not easy, even 
with the hint provided by the exercise’s title. Someone who is lucky may 
draw a link between the hypothesis a + b+ c = abc and the reasonably 
well-known fact that in a triangle labeled as in Figure 6.2 one has 


tan(a) + tan(@) + tan(y) = tan(a) tan(@) tan(7). 
This identity is easily checked by applying the addition formula for the 
tangent to the sum 7 = 7 — (a+ 8), but it is surely easier to remember 
than to discover on the spot. 
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r TS 
A point z outside of TA 
a closed bounded set H Z <) 
determines a natural TS 
“viewing angle” 2w. r3 


Fig. 6.5. The viewing angle 2~ of the convex hull of the set of roots 
T1,172,++-,1 Of P(z) determines the parameter 7 that one finds in Wilf’s 
quantitative refinement of the Gauss—Lucas Theorem. 


Exercise 6.12 (The Gauss—Lucas Theorem) 

Show that for any complex polynomial P(z) = a9 +412 +-+-+ 4p 2", 
the roots of the derivative P’(z) are contained in the convex hull H of 
the roots of P(z). 


Exercise 6.13 (Wilf’s Inequality) 
Show that if H is the convex hull of the roots of the complex polyno- 
mial P= ap + a1z2+---+ 4p z”, then one has 


1/n P 


(2) 
P(z) 


for all z ¢ H, (6.26) 


1 
~ neosw 


where the angle ~ is defined by Figure 6.5. This inequality provides both 
a new proof and a quantitative refinement of the classic Gauss—Lucas 
Theorem of Exercise 6.12. 


Exercise 6.14 (A Polynomial Lower Bound) 
Given that the zeros of the polynomial P(z) = anz" +--+: + a1z + ao 
are contained in the unit disc U = {z: |z| < 1}, show that one has 


nlan|t/™|P(z)|°-P/" /1 = |z[-? < |P'(z)| for allz ¢U. (6.27) 


Exercise 6.15 (A Complex Mean Product Theorem) 
Show that if 0 < r < 1 and if the complex numbers 21, z2,..., Zn are 
in the disk D = {z: |z| <r}, then there exists a zo € D such that 


[[G +2) =G+~)". (6.28) 


f= 1 
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Exercise 6.16 (Shapiro’s Cyclic Sum Inequality) 
Show that for positive a1, a2, a3, and a4, one has the bound 
ay a2 a3 a4 


as + + + 
Qg+a3  G@3+d4 G4t+a, a+ a2 


(6.29) 


Incidentally, the review of Bushell (1994) provides a great deal of infor- 
mation about the inequalities of the form 
i x2 Tn-1 In 


n[2< + fo ; 
T2+ %3 ~3+ U4 Int Ti +22 


This bound is known to fail for n > 25, yet the precise set of n for which 
it is valid has not yet been determined. 


Exercise 6.17 (The Three Chord Lemma) 


Show that if f : [a,b] — R is convex and a < x < 6, then one has 


F(x) — fla) — FO) — fla) © Fl) — fla) 


r-a = b-a = b-«ax 


(6.30) 


As the next two exercises suggest, this bound is the key to some of the 
most basic regularity properties of convex functions. 


Exercise 6.18 (Near Differentiability of Convex Functions) 
Use the Three Chord Lemma to show that for convex f : [a,b] — R 
and a < x < b one has the existence of the finite limits 


‘ def, f(at+h)— f(z) ' def ,, f(x —h)— f(x) 
fi(@) = nn 1 and f(x) = ae 1h 
Exercise 6.19 (Ratio Bounds and Linear Minorants) 

For convex f : [a,b] ~ R and a <a < y < b, show that one has 


ite) < fas Y=) cps sy. — 630 
In particular, note that for each 6 € [f! (x), fi. (x)] one has the bound 
f(y) => f(z)+(y-2)0 for all y € [a, }]. (6.32) 


The linear lower bound (6.32) is more effective that its simplicity would 
suggest, and it has some notable consequences. In the next chapter 
we will find that it yields and exceptionally efficient proof of Jensen’s 
inequality. 


7 


Integral Intermezzo 


The most fundamental inequalities are those for finite sums, but there 
can be no doubt that inequalities for integrals also deserve a fair share 
of our attention. Integrals are pervasive throughout science and engi- 
neering, and they also have some mathematical advantages over sums. 
For example, integrals can be cut up into as many pieces as we like, and 
integration by parts is almost always more graceful than summation by 
parts. Moreover, any integral may be reshaped into countless alternative 
forms by applying the change-of-variables formula. 

Each of these themes contributes to the theory of integral inequalities. 
These themes are also well illustrated by our favorite device — concrete 
challenge problems which have a personality of their own. 


Problem 7.1 (A Continuum of Compromise) 


Show that for an integrable f : RR — R, one has the bound 


[lte@lar ssi ( f~ jstode) ( f inteytar) (7.1) 


A QUICK ORIENTATION AND A QUALITATIVE PLAN 


The one-fourth powers on the right side may seem strange, but they 
are made more reasonable if one notes that each side of the inequality is 
homogenous of order one in f; that is, if f is replaced by Af where is 
a positive constant, then each side is multiplied by \. This observation 
makes the inequality somewhat less strange, but one may still be stuck 
for a good idea. 

We faced such a predicament earlier where we found that one often 
does well to first consider a simpler qualitative challenge. Here the nat- 
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ural candidate is to try to show that the left side is finite whenever both 
integrals on the right are finite. 

Once we ask this question, we are not likely to need long to think 
of looking for separate bounds for the integral of |f()| on the interval 
T = (-t,t) and its complement T°. If we also ask ourselves how we 
might introduce the term |af(a)|, then we are almost forced to think of 
using the splitting trick on the set T°. Pursuing this thought, we then 
find for all t > 0 that we have the bound 


[ Mte@lae= [lars [ leptwlae 


i Iz 


<2ni( f neear) + (2)"(f terenear), 73 


where in the second line we just applied Schwarz’s inequality twice. 

This bound is not the one we hoped to prove, but it makes the same 
qualitative case. Specifically, it confirms that the integral of |f(«)| is 
finite when the bounding terms of the inequality (7.1) are finite. We 
now need to pass from our additive bound to one that is multiplicative, 
and we also need to exploit our free parameter t. 

We have no specific knowledge about the integrals over T and T°, so 
there is almost no alternative to using the crude bound 


[v@rars f \ePac® a 
T R 
and its cousin 
[ leroprars [ eaP ac ® a. 

Te R 
The sum (7.2) is therefore bounded above by ¢(t) 2 92t2 A2423t- 2 B?, 
and we can use calculus to minimize ¢(t). Since ¢(t) > co as t > 0 or 
t — oo and since ¢'(t) = 0 has the unique root ty) = B2/A?, we find 
minz-ts0 A(t) = d(to) = 82 A1B2, and this gives us precisely the bound 
proposed by the challenge problem. 


DISSECTIONS AND BENEFITS OF THE CONTINUUM 


The inequality (7.1) came to us with only a faint hint that one might 
do well to cut the target integral into the piece over T = (—t, t) and the 
piece over T°, yet once this dissection was performed, the solution came 
to us quickly. The impact of dissection is usually less dramatic, but on 
a qualitative level at least, dissection can be counted upon as one of the 
most effective devices we have for estimation of integrals. 
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Here our use of a flexible, parameter-driven, dissection also helped us 
to take advantage the intrinsic richness of the continuum. Without a 
pause, we were led to the problem of minimizing ¢(t), and this turned 
out to be a simple calculus exercise. It is far less common for a discrete 
problem to crack so easily; even if one finds the analogs of t and ¢(t), 
the odds are high that the resulting discrete minimization problem will 
be a messy one. 


BEATING SCHWARZ BY TAKING A DETOUR 


Many problems of mathematical analysis call for a bound that beats 
the one which we get from an immediate application of Schwarz’s in- 
equality. Such a refinement may require a subtle investigation, but 
sometimes the critical improvement only calls for one to exercise some 
creative self-restraint. A useful motto to keep in mind is “Transform- 
Schwarz-Invert,” but to say any more might give away the solution to 
the next challenge problem. 


Problem 7.2 (Doing Better Than Schwarz) 

Show that if f : [0,co) > [0,00) ts a continuous, nonincreasing func- 
tion which is differentiable on (0,00), then for any pair of parameters 
0<a, 8B < ow, the integral 


= * ote x) dx 
rf artPye)a (7.3) 


satisfies the bound 
PS {1 - (=545)} if vt p(e)de f a7? f(x) da. (7.4) 


What makes this inequality instructive is that the direct application 
of Schwarz’s inequality to the splitting 


4? Fa) = a%/F(a) 2° y/f(2) 


would give one a weaker inequality where the first factor on the right- 
hand side of the bound (7.4) would be replaced by 1. The essence of 
the challenge is therefore to beat the naive immediate application of 
Schwarz’s inequality. 


TAKING THE HINT 


If we want to apply the pattern of “Transform-Schwarz-Invert,” we 
need to think of ways we might transform the integral (7.3), and, from 
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the specified hypotheses, the natural transformation is simply integra- 
tion by parts. To explore the feasibility of this idea we first note that 
by the continuity of f we have 27+! f(x) — 0 as x — 0, so integration 
by parts provides the nice formula 


7 ee eee ret 
if wpe) de = fafa) de, (7.5) 


provided that we also have 
xi f(t) +0 asx—oo. (7.6) 


Before we worry about checking this limit (7.6), we should first see if 
the formula (7.5) actually helps. 

If we first apply the formula (7.5) to the integral I of the challenge 
problem, we have y= a+ and 


(a+ 6+1)f= | eotOth F(x)| dx. 
0 
Thus, if we then apply Schwarz’s inequality to the splitting 
aes  Co8 |e Coa) cl oe acca Cl 


we find the nice intermediate bound 
(+046) < i 12°F) f(a)| de | 129+ f(a)| de. 
0 0 


Now we see how we can invert; we just apply integration by parts (7.5) 
to each of the last two integrals to obtain 


Pex eer | x? f(a) ax [ x78 f(a) dx. 
Here, at last, we find after just a little algebraic manipulation of the first 
factor that we do indeed have the inequality of the challenge problem. 

Our solution is therefore complete except for one small point; we still 
need to check that our three applications of the integration by parts 
formula (7.5) were justified. For this it suffices to show that we have 
the limit (7.6) when y equals 2a, 23, or a+ @, and it clearly suffices 
to check the limit for the largest of these, which we can take to be 
2a. Moreover, we can assume that in addition to the hypotheses of the 
challenge problem that we also have the condition 


ie x? f(x) dx < 00, (7.7) 
0 


since otherwise our target inequality (7.4) is trivial. 
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A POINTWISE INFERENCE 


These considerations present an amusing intermediate problem; we 
need to prove a pointwise condition (7.6) with an integral hypothesis 
(7.7). It is useful to note that such an inference would be impossible 
here without the additional information that f is monotone decreasing. 

We need to bring the value of f at a fixed point into clear view, and 
here it is surely useful to note that for any 0 < t < co we have 


"a _ fret ae ie oe 
[? f(x) dx = | aL g2ot! f(x) dx 


2a+tl t 
= tee [oles (7.8) 


1 it 
> 2a+1| z/ d . 
> arf hl @lae 


By the hypothesis (7.7) the first integral has a finite limit as tf — 00, so 
the last integral also has a finite limit as t — oo. From the identity (7.8) 
we see that f(t)t?°+!/(2a+ 1) is the difference of these integrals, so we 
find that there exists a constant 0 < c < oo such that 

jim POT exe, (7.9) 
Now, if c > 0, then there is a T such that t?¢*1 f(t) > c/2 for t > T, 
and in this case one would have 


[ x? f(a) dx > ie x dz = oo. (7.10) 


0 T 
Since this bound contradicts our assumption (7.7), we find that c = 0, 
and this fact confirms that our three applications of the integration by 
parts formula (7.5) were justified. 


ANOTHER POINTWISE CHALLENGE 


In the course of the preceding challenge problem, we noted that the 
monotonicity assumption on f was essential, yet one can easily miss the 
point in the proof where that hypothesis was applied. It came in quietly 
on the line (7.8) where the integration by parts formula was restructured 
to express f(t)t?°*! as the difference of two integrals with finite limits. 

One of the recurring challenges of mathematical analysis is the ex- 
traction of local, pointwise information about a function from aggregate 
information which is typically expressed with the help of integrals. If 
one does not know something about the way or the rate at which the 
function changes, the task is usually impossible. In some cases one can 
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succeed with just aggregate information about the rate of change. The 
next challenge problem provides an instructive example. 


Problem 7.3 (A Pointwise Bound) 
Show that if f : [0,co) — R satisfies the two integral bounds 


[ Plt@Par <o and / |’ (a) |"dax < 00, 
0 0 


then for all x > 0 one has the inequality 


peor s4{ [aporal f [roel (7.11) 


and, consequently, \/z|f(x)| > co as x > oO. 


ORIENTATION AND A PLAN 


In this problem, as in many others, we must find a way to get started 
even though we do not have a clear idea how we might eventually reach 
our goal. Our only guide here is that we know we must relate f’ to f, 
and thus we may suspect that the fundamental theorem of calculus will 
somehow help. 

This is The Cauchy-Schwarz Master Class, so here one may not need 
long to think of applying the 1-trick and Schwarz’s inequality to get the 


bound 
a+t ‘ 1/2 
< nat | f’(u)| au} i 


|f(e +t) - rol=| fF u) du 


In fact, this estimate gives us both an upper bound 


oo 1/2 
Yerolsieeert [ [rwPau} (7.12) 


and a lower bound 


oo 1/2 
2 
seerol> ile} ["|rwPauk (ras) 
and each of these offers a sense of progress. After all, we needed to find 
roles for both of the integrals 


F?(z) ff whee )P du and D?(x oe | f'(w|?du, 


x 


and now we at least see how D(x) can play a part. 
When we look for a way to relate F(x) and D(x), it is reasonable to 
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think of using D(x) and our bounds (7.12) and (7.13) to build upper and 
lower estimates for F(x). To be sure, it is not clear that such estimates 
will help us with our challenge problem, but there is also not much else 
we can do. 

After some exploration, one does discover that it is the trickier lower 
estimate which brings home the prize. To see how this goes, we first 
note that for any value of 0 < Ah such that h2 < f(x)/D(x) one has 


h h 
F(a) > | u?| f(u)|2 du = (o + t)2|f(@+H)2 dt 

h : 

> f +0 If@) - A D@Pat 
0 

> ha*{ f(x) — h? D(a)}’, 

or, a bit more simply, we have 

F(x) > h? xf f(x) —h? D(2)}. 


To maximize this lower bound we take h? = f(x)/{2D(x)}, and we find 


or af*(x) < 4F(2)D(a), 


just as we were challenged to show. 


PERSPECTIVE ON LOCALIZATION 


The two preceding problems required us to extract pointwise estimates 
from integral estimates, and this is often a subtle task. More commonly 
one faces the simpler challenge of converting an estimate for one type 
of integral into an estimate for another type of integral. We usually do 
not have derivatives at our disposal, yet we may still be able to exploit 
local estimates for global purposes. 


Problem 7.4 (A Divergent Integral) 
Given f : [1,00) — (0,00) and a constant c > 0, show that if 


t foe) 
/ f(a) dx < ct? foralll<t<o_ then i Suey en 
1 1 F(z) 


AN IDEA THAT DOES NOT QUITE WoRK 


Given our experiences with sums of reciprocals (e.g., Exercise 1.2, 
page 12), it is natural to think of applying Schwarz’s inequality to the 
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splitting 1 = f(x) -{1// f(x)}. This suggestion leads us to 


ware free) < fH yax fot faye (7.14) 


so, by our hypothesis we find 


ea re [x F@) 


and when we let t — o6 we find the bound 


ee eae 
ay Fa (7.15) 


Since we were challenged to show that the last integral is infinite, we 
have fallen short of our goal. Once more we need to find some way to 
sharpen Schwarz. 


FOCUSING WHERE ONE DOES WELL 


When Schwarz’s inequality disappoints us, we often do well to ask 
how our situation differs from the case when Schwarz’s inequality is 
at its best. Here we applied Schwarz’s inequality to the product of 
d(x) = f(x) and w(x) = 1/f(x), and we know that Schwarz’s inequality 
is sharp if and only if (x) and w(x) are proportional. Since f(a) and 
1/f(a) are far from proportional on the infinite interval [0,0o), we get 
a mild hint: perhaps we can do better if we restrict our application of 
Schwarz’s inequality to the corresponding integrals over appropriately 
chosen finite intervals [A, B]. 

When we repeat our earlier calculation for a generic interval [A, B] 
with 1< A < B, we find 


; B Boy 
(B — A) a) fla) ae f C5 ak (7.16) 


and, now, we cannot do much better in our estimate of the first integral 
than to exploit our hypothesis via the crude bound 


B B 
| f(z) ax < f f(a) dx < cB’, 
A 1 


after which inequality (7.16) gives us 


2 By 
<f F(a) dx. (7.17) 


The issue now is to see if perhaps the flexibility of the parameters A and 
B can be of help. 
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This turns out to be a fruitful idea. If we take A = 2/ and B = 2)*1, 
then for all 0 < j < oo we have 


1 at 
are 
4c 25 f(z) 

and if we sum these estimates over 0 < j < k we find 


k ae 4 nl 
me f ws! fay” (7.18) 


Since k is arbitrary, the last inequality does indeed complete the solution 
to our fourth challenge problem. 
A FINAL PROBLEM: JENSEN’S INEQUALITY FOR INTEGRALS 


The last challenge problem could be put simply: “Prove an integral 
version of Jensen’s inequality.” Naturally, we can also take this oppor- 
tunity to add something extra to the pot. 


Problem 7.5 (Jensen’s Inequality: An Integral Version) 
Show that for each interval I C R and each conver ® : I — R, one 
has the bound 


0 [ neute) ax) < f @(h(2)) wa) dz, (7.19) 


for eachh: D— TI and each weight function w: D — (0,00) such that 


[wo a4, 


THE OPPORTUNITY TO TAKE A GEOMETRIC PATH 


We could prove the conjectured inequality (7.19) by working our way 
up from Jensen’s inequality for finite sums, but it is probably more 
instructive to take a hint from Figure 7.1. If we compare the figure to 
our target inequality and if we ask ourselves about reasonable choices 
for 4, one candidate which is sure to make our list is 


p= [ (sx) w() de: 


after all, ®(j) is already present in the inequality (7.19). 

Noting that the parameter ¢ is still at our disposal, we now see that 
®(h(x)) may be brought into action if we set t = h(x). If @ denotes the 
slope of the support line pictured in Figure 7.1, then we have the bound 


@(p1) + (h(x) — pO < B(A(x)) for all x € D. (7.20) 
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The linear 

lower bound 

is often more 
powerful than 
one might guess. 


Fig. 7.1. For each point p = (yw, ®(~)) on the graph of a convex function ®, 
there is a line through p which never goes above the graph of ®. If ® is 
differentiable, the slope 6 of this line is ®’(y), and if © is not differentiable, 
then according to Exercise 6.19 one can take @ to be any point in the interval 
[®"_ (4), ®4. ()] determined by the left and right derivatives. 


If we multiply the bound (7.20) by the weight factor w(x) and integrate, 
then the conjectured bound (7.19) falls straight into our hands because 
of the relation 


[e@ Sulede= 6f i ania uh =) 


PERSPECTIVES AND COROLLARIES 


Many integral inequalities can be proved by a two-step pattern where 
one proves a pointwise inequality and then one integrates. As the proof 
of Jensen’s inequality suggests, this pattern is particularly effective when 
the pointwise bound contains a nontrivial term which has integral zero. 

There are many corollaries of the continuous version of Jensen’s in- 
equality, but probably none of these is more important than the one we 
obtain by taking ®(x) = e® and by replacing h(x) by log h(x). In this 
case, we find the bound 


exp ( 3 log{h(x) } w(x) a) < is h(x)w(x) de, (7.21) 


which is the natural integral analogue of the arithmetic-geometric mean 
inequality. 

To make the connection explicit, one can set h(a) = ay > 0 on [k—1,k) 
and set w(x) = pp > 0 0n [Kk —1,k) for 1 <k& <n. One then finds that 
for py + po +-+:+ Py = 1 the bound (7.21) reduces to exactly to the 
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classic AM-GM bound, 


II as So pear: (7.22) 


Incidentally, the integral analog (7.21) of the AM-GM inequality (7.22) 
has a long and somewhat muddy history. Apparently, the inequality was 
first recorded (for w(x) = 1) by none other than V. Y. Bunyakovsky. It 
even appears in the famous Mémoire 1859 where Bunyakovsky intro- 
duced his integral analog of Cauchy’s inequality. Nevertheless, in this 
case, Bunyakovsky’s contribution seems to have been forgotten even by 
the experts. 


EXERCISES 


Exercise 7.1 (Integration of a Well-Chosen Pointwise Bound) 
Many significant integral inequalities can be proved by integration of 
an appropriately constructed pointwise bound. For example, the integral 
version (7.19) of Jensen’s inequality was proved this way. 
For a more flexible example, show that there is a pointwise integration 
proof of Schwarz’s inequality which flows directly from the symmetrizing 
substitutions 


ur f(x)g(y) and vt f(y)g(z) 


and familiar bound 2uv < u? + v?. 


Exercise 7.2 (A Centered Version of Schwarz’s Inequality) 

If w(x) > 0 for all x € R and if the integral w over R is equal to 1, 
then the weighted average of a (suitably integrable) function f:R—R 
is defined by the formula 


a= f  F@\wlerae 


Show that for functions f and g, one has the following bound on the 
average of their product, 


{A(fg) — A(f)A(g)}° < {A(f?) — A2(f)}{A(g?) — A2(9)}, 


provided that all of the indicated integrals are well defined. 
This inequality, like other variations of the Cauchy and Schwarz in- 
equalities, owes its usefulness to its ability to help us convert information 
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on two individual functions to information about their product. Here 
we see that the average of the product, A(fg), cannot differ too greatly 
from the product of the averages, A(f) A(g), provided that the variance 
terms, A(f?) — A?(f) and A(g?) — A?(g), are not too large. 


Exercise 7.3 (A Tail and Smoothness Bound) 
Show that if f : R — R has a continuous derivative then 


[Were 2( [* inaeas) ([~ reorar)” 


Exercise 7.4 (Reciprocal on a Square) 
Show that for a > 0 and b > 0 one has the bound 


1 my ae dx dy 
mat. i ag 
a+b+1 4 b x+y 


which is a modest — but useful — improvement on the naive lower 
bound 1/(a+ 6+ 2) which one gets by minimizing the integrand. 


Exercise 7.5 (Estimates via Integral Representations) 


The complicated formula for the derivative 


d* sint  sint ; 2cost 12sint 24cost 25sint 


det tt | #€ 8 4 pS 


may make one doubt the possibility of proving a simple bound such as 


for all t € R. (7.23) 


Nevertheless, this bound and its generalization for the n-fold derivative 
are decidedly easy if one thinks of using the integral representation 


1 
—— ‘ cont sihas (7.24) 
0 


Show how the representation (7.24) may be used to prove the bound 
(7.23), and give at least one further example of a problem where an 
analogous integral representation may be used in this way. The moral 
of this story is that many apparently subtle quantities can be estimated 
efficiently if they can first be represented as integrals. 
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Exercise 7.6 (Confirmation by Improvement) 

Confirm your mastery of the fourth challenge problem (page 111) by 
showing that you can get the same conclusion from a weaker hypothesis. 
For example, show that if there is a constant 0 < c < oo such that the 
function f : [1,co) — (0,00) satisfies the bound 


a f(x) dx < ct? logt, (7.25) 
1 


then one still has divergence of the reciprocal integral 


[ yer 


Exercise 7.7 (Triangle Lower Bound) 
Suppose the function f : [0,00) — [0,00) is convex on [T,oo) and 
show that for all t > T’ one has 


; PO/FO|< i © Flu) du. 720) 


This is called the triangle lower bound, and it is often applied in proba- 
bility theory. For example, if we take f(u) = enw /2 /V2n then it gives 
the lower bound 

et /2 1 


rae 
2tV2n VT it 


although one can do a little better in this specific case. 


eo! dy for t> 1, 


Exercise 7.8 (The Slip-in Trick: Two Examples) 
(a) Show that for all n = 1,2,... one has the lower bound 


(b) Show that for all « > 0 one has the upper bound 
ve 252 1 _2 
t= | ech Td = Set 
- x 


No one should pass up this problem. The “slip-in trick” is one of the 
most versatile tools we have for the estimation of integrals and sums; to 
be unfamiliar with it would be to suffer an unnecessary handicap. 
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A favorite example of 
J.E. Littlewood which 
illustrates the legitimacy 
of pictorial arguments 


Fig. 7.2. Consider a function g(x) for which |g’(x)| < B, so g cannot change 
too rapidly. If g(vo) = P > 0 for some xo, then there is a certain triangle 
which must lie under the graph of g. This observation reveals an important 
relation between g, g’, and the integral of g. 


Exercise 7.9 (Littlewood’s Middle Derivative Squeeze) 
Show that if f : [0,co) — R is twice differentiable and if |f”(2)| is 
bounded, then 
lim f(z)=0 implies lim f’(r) =0. 


zwL— Co w—- CO 


In his Miscellany, J.E. Littlewood suggests that “pictorial arguments, 
while not so purely conventional, can be quite legitimate.” The result 
of this exercise is his leading example, and the picture he offered is 
essentially that of Figure 7.2. 


Exercise 7.10 (Monotonicity and Integral Estimates) 

Although the point was not stressed in this chapter, many of the 
most useful day-to-day estimates of integrals are found with help from 
monotonicity. Gain some practical experience by proving that 


1 
dt 1-2 
‘| log(1 + #)—- < (log 2) for all0<a<1 


and by showing that 2log2 cannot be replaced by a smaller constant. 
Incidentally, this particular inequality is one we will see again when it 
helps us with Exercise 11.6. 


Exercise 7.11 (A Continuous Carleman-Type Inequality) 


Given an integrable f : [a,b] > [0,0o) and an integrable weight func- 
tion w : [a,b] > [0, co) with integral 1 on [a,b], show that one has 


b b 
exp | {log f(x) }w(ax) da < cf f(x)w(a) dx. (7.27) 
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Exercise 7.12 (Griiss’s Inequality — Integrals of Products) 
Suppose that —coo <a < A< oo and —ow < 6 < B < o and suppose 
that functions f and g satisfy the bounds 


a<f(«)<A and 6<g(a)<B for allO<a@<1. 


Show that one has the bound 


| | RiCnOre / rion / roe 


and show by example that the factor of 1/4 cannot be replaced by a 
smaller constant. 


1 
< 4 =0)(B = 2); 


8 
The Ladder of Power Means 


The quantities that provide the upper bound in Cauchy’s inequality are 
special cases of the general means 


1/t 
M, = M,[x; p] = {oma} (8.1) 


where p = (pi, p2,---;Pn) is a vector of positive weights with total mass 
of pi +pe+:--+p, = land x = (#1, 2#2,...,@n) is a vector of nonnegative 
real numbers. Here the parameter ¢ can be taken to be any real value, 
and one can even take t = —oo or t = ov, although in these cases and 
the case t = 0 the general formula (8.1) requires some reinterpretation. 
The proper definition of the power mean Mp is motivated by the natural 
desire to make the map t+ M; a continuous function on all of R. The 
first challenge problem suggests how this can be achieved, and it also 
adds a new layer of intuition to our understanding of the geometric 
mean. 


Problem 8.1 (The Geometric Mean as a Limit) 

For nonnegative real numbers xy, k = 1,2,...,n, and nonnegative 
weights pp, k = 1,2,...,n with total mass p; + po +--:+ pn = 1, one 
has the limit 


n 


1/t n 

. t = Pk 

tin { 9 pu} = [Tt (8.2) 
k=1 k=1 

APPROXIMATE EQUALITIES AND LANDAU’S NOTATION 


The solution of this challenge problem is explained most simply with 
the help of Landau’s little o and big O notation. In this useful shorthand, 
the statement lim; f(t)/g(t) = 0 is abbreviated simply by writing 


120 
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f(t) = o(g(t)) as t — 0, and, analogously, the statement that the ratio 
f (t)/g(t) is bounded in some neighborhood of 0 is abbreviated by writing 
f(t) = O(g(t)) as t > 0. By hiding details that are irrelevant, this 
notation often allows one to render a mathematical inequality in a form 
that gets most quickly to its essential message. 

For example, it is easy to check that for all 2 > —1 one has a natural 
two-sided estimate for log(1+ 2), 


x 
1l+a 


1l+2a "a 
< | — =log(14+ 2) <a, 
1 U 


yet, for many purposes, these bounds are more efficiently summarized 
by the simpler statement 


log(1 +a) =a2+ O(a?) as x > 0. (8.3) 


Similarly, one can check that for all |z] < 1 one has the bound 


though, again, for many calculations we only need to know that these 
bounds give us the relation 


e“=1+2+O(z7) asx—0. (8.4) 


Landau’s notation and the big-O relations (8.3) and (8.4) for the log- 
arithm and the exponential now help us calculate quite smoothly that 
as t — 0 one has 


n 1/t n 
Le 
log {( y px) \ = te { y pretioess | 
k=1 k=l 
= toe { y Dk (1+tIogm. +0(#)) } 


k=1 


+ | 


tle 


I 


i n 
roe {1+ ¢) pa log.n +o) 


k=1 


= So px log a, + O(t). 
k=1 


This big-O identity is even a bit stronger than one needs to confirm the 
limit (8.2), so the solution of the challenge problem is complete. 
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A COROLLARY 


The formula (8.2) provides a general representation of the geometric 
mean as a limit of a sum, and it is worth noting that for two summands 
it simply says that 


Pp 
lim {oat a ee aon} =a‘? (8.5) 


pro 


all nonnegative a, b, and 6 € [0,1]. This formula and its more compli- 
cated cousin (8.2) give us a general way to convert information for a sum 
into information for a product. 

Later we will draw some interesting inferences from this observation, 
but first we need to develop an important relation between the power 
means and the geometric mean. We will do this by a method that is 
often useful as an exploratory tool in the search for new inequalities. 


SIEGEL’S METHOD OF HALVES 


Carl Ludwig Siegel (1896-1981) observed in his lectures on the geome- 
try of numbers that the limit representation (8.2) for the geometric mean 
can be used to prove an elegant refinement of the AM-GM inequality. 
The proof calls on nothing more than Cauchy’s inequality and the limit 
characterization of the geometric mean, yet it illustrates a sly strategy 
which opens many doors. 


Problem 8.2 (Power Mean Bound for the Geometric Mean) 


Follow in Siegel’s footsteps and prove that for any nonnegative weights 
Pr, k= 1,2,...,n with total mass py + po +---+ pn = 1 and for any 


nonnegative real numbers xp, k = 1,2,...,n, one has the bound 
n n 1/t 
II os { re, for allt > 0. (8.6) 
k=1 k=1 


As the section title hints, one way to approach such a bound is to 
consider what happens when ¢ is halved (or doubled). Specifically, one 
might first aim for an inequality such as 


M,< My, for allt >0, (8.7) 


and afterwards one can then look for a way to draw the connection to 
the limit (8.2). 
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As usual, Cauchy’s inequality is our compass, and again it points us 
to the splitting trick. If we write p,x}, = ppp; zj, we find 


n 
1/2 1/2 
M? = D> ppt, = 3 Pa Ph 7 
k=1 


n 7 n 2 
<(Sn) (Sem) = Mi, 
k=1 k=1 


and now when we take the tth root of both sides, we have before us the 
conjectured doubling formula (8.7). 

To complete the solution of the challenge problem, we can simply 
iterate the process of taking halves, so, after 7 steps, we find for all real 
t > 0 that 


Myyai < Meyoi-1 S -++ < Miya < Mi. (8.8) 


Now, from the limit representation of the geometric mean (8.2) we have 


Hee n Meas =Mo= ee 
k=1 


so from the halving bound (8.8) we find that for all t > 0 one has 
n 1/t 
II go = Mo <M, = { Yoni} for allt > 0. (8.9) 


MONOTONICITY OF THE MEANS 


Siegel’s doubling relation (8.7) and the plot given in Figure 8.1 of the 
two-term power mean (pa* + qy')'/* provide us with big hints about the 
quantitative and qualitative features of the general mean M;. Perhaps 
the most basic among these is the monotonicity of the map t > M; 
which we address in the next challenge problem. 


Problem 8.3 (Power Mean Inequality) 

Consider positive weights p,, k = 1,2,...,n which have total mass 
pit pet-:::+ pn =1, and show that for nonnegative real numbers xz, 
k =1,2,...,n, the mapping t > M; is a nondecreasing function on all 
of R. That is, show that for all —co <s <t< oo one has 


n l/s n 1/t 
{ Smet} < { Sonn} (8.10) 
k=1 k=1 


Finally, show that then one has equality in the bound (8.10) if and only 
if vy = %g =+++ = Ly. 
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Mo = max(x, y) 

M2 = \/pa + qy? 
Mi = px + gy 

Mo = xy! 

M_1 = 1/(p/x + q/y) 
M_.x. = min(z, y) 


The Power Mean Curve 


—l 


Fig. 8.1. Ife >0,y>0,0<p<1 and q=1-p, then a qualitative plot 
of M, = (pa? + qy’)'/* for —oco < t < co suggests several basic relationships 
between the power means. Perhaps the most productive of these is simply the 
fact that M; is a monotone increasing function of the power t, but all of the 
elements of the diagram have their day. 


THE FUNDAMENTAL SITUATION: 0<s<t 


One is not likely to need long to note the resemblance of our target 
inequality (8.10) to the bound one obtains from Jensen’s inequality for 
the map «+ ax? with p> 1, 


n Pp n 
{ Spee} = So perk. 
k=1 k=1 


In particular, if we assume 0 < s < t then the substitutions y; = x, and 
p=t/s>1 give us 


n t/s n 
{ Spit} < SS Pevb (8.11) 
k=1 k=1 


so taking the tth root gives us the power mean inequality (8.10) in the 
most basic case. Moreover, the strict convexity of «+> x? for p > 1 tells 
us that if p, > 0 for all k = 1,2,...,n, then we have equality in the 
bound (8.11) if and only if a1 = rg =-+-= 2p. 


THE REST OF THE CASES 


There is something aesthetically unattractive about breaking a prob- 
lem into a collection of special cases, but sometimes such decompositions 
are unavoidable. Here, as Figure 8.2 suggests, there are two further cases 
to consider. The most pressing of these is Case II where s < t < 0, and 
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TI I Case I: 0<s<t 
Case II: s<t <0 
Case III: s<O0<t 


Fig. 8.2. The power mean inequality deals with all —co < s < t < o and 
Jensen’s inequality deals directly with Case I and indirectly with Case II. 
Case III has two halves s = 0 < t and s < t = 0 which are consequences of 
the geometric mean power mean bound (8.6). 


we cover it by applying the result of Case I. Since —t > 0 is smaller than 
—s > 0, the bound of Case I gives us 


n —1/t n —1/s 
{ YD nne;'} < { Ynez} 
k=1 k=1 
Now, when we take reciprocals we find 
L/t 


n 1/s n 
{ ne} = { nne;'} 
k=1 k=1 


so when we substitute x, = ue, we get the power mean inequality for 
s<t<0. 

Case III of Figure 8.2 is the easiest of the three. By the PM-GM 
inequality (8.6) for 77‘, 1 < k <n, and the power 0 < —s, we find after 
taking reciprocals that 


n l/s n 
{ Smt} < II ae for all s < 0. (8.12) 


Together with the basic bound (8.6) for 0 < t, this completes the proof 
of Case III. 

All that remains now is to acknowledge that the three cases still leave 
some small cracks unfilled; specifically, the boundary situations 0 = s < t 
and s < t = 0 have been omitted from the three cases of Figure 8.2. 
Fortunately, these situations were already covered by the bounds (8.6) 
and (8.12), so the solution of the challenge problem really is complete. 
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In retrospect, Cases II and III resolved themselves more easily than 
one might have guessed. There is even some charm in the way the 
geometric mean resolved the relation between the power means with 
positive and negative powers. Perhaps we can be encouraged by this 
experience the next time we are forced to face a case-by-case argument. 


SOME SPECIAL MEANS 

We have already seen that some of the power means deserve special 
attention, and, after t = 2, t = 1, and t = 0, the cases most worthy of 
note are t = —1 and the limit values one obtains by taking t — oo or by 
taking t — —oo. When t = —1, the mean M_, is called the harmonic 
mean and in longhand it is given by 


1 
pi/t1 + po/t2+-+++Pn/tn 
From the power mean inequality (8.10) we know that M_, provides a 
lower bound on the geometric mean, and, a fortiori, one has a bound on 
the arithmetic mean. Specifically, we have the harmonic mean-geometric 
mean inequality (or the HM-GM inequality) 
1 
pi/x1 + po/t2+++++Pn/Xn 


M_1 = M_[x;p] = 


ae ree A (8.13) 


and, as a corollary, one also has the harmonic mean-arithmetic mean 
inequality (or the HM-AM inequality) 


1 
pi/ty + po/t2+-+-+pn/tn 


<S pix, + pote +++ + PnFn. (8.14) 


Sometimes these inequalities come into play just as they are written, 
but perhaps more often we use them “upside down” where they give us 
useful lower bounds for the weighted sums of reciprocals: 


1 Pi p2 Pn 
< pee bt, 8.15 
Se ae Be 8 oe Ln e29) 
1 
< Ds Be ag oe, Bri. (8.16) 
Pit, + pote + +++ + Pn&n Ty 2X2 In 


GOING TO EXTREMES 


The last of the power means to require special handling are those for 
the extreme values t = —oo and t = oo where the appropriate definitions 
are given by 


M_.[x; p] = min zc, and M,.[x;p]= Max Lp. (8.17) 
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With this interpretation one has all of the properties that Figure 8.1 
suggests. In particular, one has the obvious (but useful) bounds 


M_.[x;p] < Mi[x;p] < Ma[x;p] for all (ER, 
and one also has the two continuity relations 
jim Mi[x;p] = Moo[x;p] and lim) Mi[x; p] = M_.o[X; p]. 
To check these limits, we first note that for allt > O andalll <k<n 
we have the elementary bounds 
Pee, < Mi[x;p] < M3bs pl], 


and, since pz > 0 we have p,l* — 1 as t— oo, so we can take roots and 
let t — oo to deduce that for all 1 < k < n we have 


xv, < liminf M;[x; p] < limsup M;|x; p] < M..[x; p]. 
t—oo t-00 
Since max, x, = M..[x; p], we have the same bound on both the extreme 
left and extreme right, so in the end we see 
lim M;[x; p] = Moo[x; p]. 
t—-0o 
This confirms the first continuity relation, and in view of the general 


identity M_;(#1,22,...,%n;p) = M;'(1/21,1/22,...,1/an; p), the sec- 
ond continuity relation follows from the first. 


THE INTEGRAL ANALOGS 


The integral analogs of the power means are also important, and their 
relationships follows in lock-step with those one finds for sums. To make 
this notion precise, we take D C R and we consider a weight function 
w : D — (0,00) which satisfies 


| w(az)dx=1 and w(x) >0 for all « € D, 
D 


then for f : D — [0, co] and t € (—00, 0) U(0, co) we define the tth power 
mean of f by the formula 


M=Mifvw] = { [ few al (8.18) 


As in the discrete case, the mean Mo requires special attention, and for 
the integral mean the appropriate definition requires one to set 


Molf;w] = exp (/. {toe f(x) bute) a), (8.19) 
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Despite the differences in the two forms (8.18) and (8.19), the defini- 
tion (8.19) should not come as a surprise. After all, we found earlier 
(page 114) that the formula (8.19) is the natural integral analog of the 
geometric mean of f with respect to the weight function w. 

Given the definitions (8.18) and (8.19), one now has the perfect analog 
of the discrete power mean inequality; specifically, one has 


M,|f;w] < ML f; wv] for all -co<s<t<o. (8.20) 


Moreover, for well-behaved f, say, those that are continuous, one has 
equality in the bound (8.20) if and only if f is constant on D. 

We have already invested considerable effort on the discrete power 
mean inequality (8.10), so we will not take the time here to work out a 
proof of the continuous analog (8.20), even though such a proof provides 
worthwhile exercise that every reader is encouraged to pursue. Instead, 
we take up a problem which shows as well as any other just how effective 
the basic bound Mo[f;w] < Mi[f;w] is. In fact, we will only use the 
simplest case when D = [0,1] and w(x) = 1 for all x € D. 


CARLEMAN’S INEQUALITY AND THE CONTINUOUS AM-GM BouND 


In Chapter 2 we used Pélya’s proof of Carleman’s geometric mean 
bound, 


S"(arag-++ay)'/* < eS ap, (8.21) 
= k=1 


k=1 


as a vehicle to help illustrate the value of restructuring a problem so 
that the AM-GM inequality could be used where it is most efficient. 
Pélya’s proof is an inspirational classic, but if one is specifically curious 
about Carleman’s inequality, then there are several natural questions 
that Polya’s analysis leaves unanswered. 

One feature of Pélya’s proof that many people find perplexing is that 
it somehow manages to provide an effective estimate of the total of all 
the summands (a,a2---a,)'/* without providing a compelling estimate 
for the individual summands when they are viewed one at a time. The 
next challenge problem solves part of this mystery by showing that there 
is indeed a bound for the individual summands which is good enough so 
that it can be summed to obtain Carleman’s inequality. 
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Problem 8.4 (Termwise Bounds for Carleman’s Summands) 


Show that for positive real numbers az, k = 1,2,..., one has 
1/n € = = 
(A142 +++ An) 1" < rr ae forn=1,2,..., (8.22) 


and then show that these bounds can be summed to prove the classical 
Carleman inequality (8.21). 


A REASONABLE FIRST STEP 


The unspoken hint of our problem’s location suggests that one should 
look for a role for the integral analogs of the power means. Since we 
need to estimate the terms (a,a2:-- Gn)! ” it also seems reasonable to 
consider the integrand f : [0,00) — R where we take f(x) to be equal 
to ay, on the interval (k —1,k] for 1 < k < oo. This choice makes it easy 
for us to put the left side of the target inequality (8.22) into an integral 
form: 


= exp {f log f (ny) iy} ; (8.23) 


This striking representation for the geometric mean almost begs us to 
apply continuous version of the AM-GM inequality. 

Unfortunately, if we were to acquiesce, we would find ourselves embar- 
rassed; the immediate application of the continuous AM-GM inequality 
to the formula (8.23) returns us unceremoniously back at the classical 
discrete AM-GM inequality. For the moment, it may seem that the nice 
representation (8.23) really accomplishes nothing, and we may even be 
tempted to abandon this whole line of investigation. Here, and at similar 
moments, one should take care not to desert a natural plan too quickly. 


A DEEPER LOOK 


The naive application of the AM-GM bound leaves us empty handed, 
but surely there is something more that we can do. At a minimum, we 
can review some of Pdélya’s questions and, as we work down the list, 
we may be struck by the one that asks, “Is it possible to satisfy the 
condition?” 
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Here the notion of condition and conclusion are intertwined, but ulti- 
mately we need a bound like the one given by the right side of our target 
inequality (8.22). Once this is said, we will surely ask ourselves where 
the constant factor e is to be found. Such a factor is not in the formula 
(8.23) as it stands, but perhaps we can put it there. 

This question requires exploration, but if one thinks how e might be 
expressed in a form that is analogous to the right side of the formula 
(8.23), then sooner or later one is likely to have the lucky thought of 
replacing f(ny) by y. One would then notice that 


1 
€ = exp \-f loguay} ; (8.24) 
0 


and this identity puts us back on the scent. We just need to slip log y 
into the integrand and return to our original plan. Specifically, we find 


exp rh log f(ny) ay = exp ae log{yf (ny) } — logy ay} 
= eexp ne loetu sly) bay} 


Le | uf (ny) dy, (8.25) 


where in the last step we finally get to apply the integral version of the 
AM-GM inequality. 


Two FINAL STEPS 


Now, for the function f defined by setting f(x) = a, for x € (k—1,kl, 
we have the elementary identity 


[ yf (ny) dy = a Ae van dy = 35 Da 2k — 1)ak, (8.26) 


k=1 


so, in view of the general bound (8.25) and the identity (8.23), the proof 
of the first inequality (8.22) of the challenge problem is complete. 

All that remains is for us to add up the termwise bounds (8.22) and 
check that the sum yields the classical form of Carleman’s inequality 
(8.21). This is easy enough, but some care is still needed to squeeze out 
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exactly the right final bound. Specifically, we note that 


and, when we insert this bound in the identity (8.26), we see that the 
estimate (8.25) does indeed complete the proof of Carleman’s inequality. 


EXERCISES 


Exercise 8.1 (Power Means in Disguise) 

To use the power mean inequality effectively one must be able to pick 
power means out of a crowd, and this exercise provides some practice. 
Prove that for positive x, y, and z, one has 


9 1 1 1 


8.27 
Aatyt+z) at+y «ete ytez eet) 
and prove that for p > 1 one also has 

1 P yP 2p 

ae gas 8.28 

2 (ety?) gue ace eee ( ) 


Incidentally, one might note that for p = 1 the second bound reduces to 
the much-proved Nesbitt inequality of Exercise 5.6. 


Exercise 8.2 (Harmonic Means and Recognizable Sums) 


Suppose 71, %2,...,@p are positive and let S denote their sum. Show 
that we have the bound 
n? Z S Ss S 
(Qn—1)~ 29—a, ° 28-2 | OS =a 


In this problem (and many like it) one gets a nice hint from the fact 
that there is a simple expression for the sum of the denominators on the 
right-hand side. 
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Exercise 8.3 (Integral Analogs and Homogeneity in %) 
(a) Show that for all nonnegative sequences {a; : 1 < k <n} one has 


» a] < | : (8.29) 
k=1 k=1 


and be sure to notice the differences between this bound and the power 
mean inequality (8.10) with s = 1/3 and t = 1/2. 

(b) By analogy with the bound (8.29), one might carelessly guess that 
for nonnegative f that one has an integral bound 


{f ee) ar} as ff pe) ae (8.30) 


Show by example that the bound (8.30) does not hold in general. 

The likelihood of an integral analog can often be explained by a heuris- 
tic principle which Hardy, Littlewood, and Polya (1952, p. 4) describes 
as “homogeneity in &.” The principle suggests that we consider © in a 
bound such as (8.29) as a formal symbol. In this case we see that the left 
side is “homogeneous of order two in %” while the right side is “homo- 
geneous of order three in }.” The two sides are therefore incompatible, 
and one should not expect any integral analog. On the other hand, in 
Cauchy’s inequality and Holder’s inequality, both sides are homogeneous 
of order one in ©. It is therefore natural — even inevitable — that we 
should have integral analogs for these bounds. 


Exercise 8.4 (Pélya’s Minimax Characterization) 

Suppose you must guess the value of an unknown number z in the 
interval [a,b] C (0,00) and suppose you will be forced to pay a fine 
based on the relative error of your guess. How should you guess if you 
want to minimize the worst fine that you would have to pay? 


If you guess is p, then the maximum fine you would have to pay is 


F(p) = moe {Poth (8.31) 


x 


so your analytical challenge is to find the value p* such that 


F(p*) = min F(p) = min max {2 \ (8.32) 


Pp p «€[a,b] x 


One expects p* to be some well-known mean, but which one is it? 
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Exercise 8.5 (The Geometric Mean as a Minimum) 
Prove that the geometric mean has the representation 


n 1/n n 
1 
= i = : wis n D 5 : 
{I} min {7 aa (@1,%2,...,2n) € \ (8.33) 


where D is the region of R” defined by 


n 
D= { (cata, : [| =1a,>0,k= cen 
k=1 
For practice with this characterization of the geometric mean, use it to 
give another proof that the geometric mean is superadditive; that is, 
show that the formula (8.33) implies the bound (2.31) on page 34. 


Exercise 8.6 (More on the Method of Halves) 

The method of halves applies to more than just inequalities; it can 
also be used to prove some elegant identities. As an illustration, show 
that the familiar half-angle formula sin x = 2sin(x/2) cos(x/2) implies 
the infinite product identity 


ee I cos(2/2*), (8.34) 


and verify in turn that this implies the poignant formula 


2 V2 f2+v2 V2+vV2+v2 
-~ 4. ; Mus, 


T 


Incidentally, the product formula (8.34) for sin(a)/x is known as Viete’s 
identity, and it has been known since 1593. 


Exercise 8.7 (Differentiation of an Inequality) 

In general one cannot differentiate the two sides of an inequality and 
expect any meaningful consequences, but there are special situations 
where “differentiation of an inequality” does make sense. There are even 
times when such differentiations have lead to spectacular new results. 
The aspirations of this exercise are more modest, but they point the way 
to what is possible. 

(a) Consider a function f that is differentiable at to and that satisfies 
the bound f(to) < f(t) for all t € [to,to + A) and some A > 0. Show 
that one then has 0 < f’(to). 
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(b) Use the preceding observation to show that the power mean in- 
equality implies that for all x, > 0 and all nonnegative pz, with total 
pit p2t+-::+ pn =1, one has 


n n n 
{ Sonex} ioe { Spx} < { Spm logan}. (8.35) 
k=1 k=1 k=1 
Exercise 8.8 (A Niven—Zuckerman Lemma for pth Powers) 
Consider a sequence of n-tuples of nonnegative real numbers 
(Qik; 2k;--+;Ank) 1 a ee arene 
Suppose there is a constant ys > 0 for which one has 
Aik + Gan ++++ + Ank > NL as k > oo, (i) 
and suppose for some 1 < p < co such that one also has 


an. oe fe, ee sat as np? as k > o. (ii) 


Show that these conditions imply that one then has the n-term limit 
lim ajp = pb forall <j<n. 
k—0o 


This exercise provides an example of a consistency principle which in this 
case asserts that if the sum of the coordinates of a vector and the sum 
of the corresponding pth powers have limits that are consistent with the 
possibility that all of the coordinates converge to a common constant, 
then that must indeed be the case. The consistency principle has many 
variations and, like the optimality principle of Exercise 2.8, page 33, it 
provides useful heuristic guidance even when it does not formally apply. 


Exercise 8.9 (Points Crowded in an Interval) 

Given n points in the interval [—1, 1], we know that some pairs must 
be close together, and there are many ways to quantify this crowding. 
An uncommon yet insightful way once exploited by Paul Erdés is to look 
at the sum of the reciprocal gaps. 

(a) Suppose that —1 < a1 < 4g <--+ <a, <1, and show that 


1 1 
ve = ae log n. 


Lk XG 
1<jek<n  * 9 


(b) Show that for any permutation @ : [n] > [n] one has the bound 


k-1 


1 1 
max y > =nlogn. 
1l<k<n jal Eres a eG 8 


9 
Holder’s Inequality 


Four results provide the central core of the classical theory of inequal- 
ities, and we have already seen three of these: the Cauchy—Schwarz 
inequality, the AM-GM inequality, and Jensen’s inequality. The quartet 
is completed by a result which was first obtained by L.C. Rogers in 1888 
and which was derived in another way a year later by Otto Holder. Cast 
in its modern form, the inequality asserts that for all nonnegative az 
and by, k = 1,2,...,n, one has the bound 


n n 1/ n 1/ 
Sande < (5-4) Los it) . (9.1) 
k=1 k=1 


k=1 
provided that the powers p > 1 and q > 1 satisfy the relation 


+ : =1. (9.2) 
P 4 
Ironically, the articles by Rogers and Holder leave the impression that 
these authors were mainly concerned with the extension and application 
of the AM-GM inequality. In particular, they did not seem to view 
their version of the bound (9.1) as singularly important, though Rogers 
did value it enough to provide two proofs. Instead, the opportunity fell 
to Frigyes Riesz to cast the inequality (9.1) in its modern form and to 
recognize its fundamental role. Thus, one can argue that the bound (9.1) 
might better be called Rogers’s inequality, or perhaps even the Rogers— 
Holder—Riesz inequality. Nevertheless, long ago, the moving hand of 
history began to write “Holder’s inequality,” and now, for one to use 
another name would be impractical, though from time to time some 
acknowledgment of the historical record seems appropriate. 
The first challenge problem is easy to anticipate: one must prove the 
inequality (9.1), and one must determine the circumstances where equal- 
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ity can hold. As usual, readers who already know a proof of Holder’s 
inequality are invited to discover a new one. Although, new proofs of 
Holder’s inequality appear less often than those for the Cauchy—Schwarz 
inequality or the AM-GM inequality, one can have confidence that they 
can be found. 


Problem 9.1 (Hélder’s Inequality) 


First prove Riesz’s version (9.1) of the inequality of Rogers (1888) and 
Holder (1889), then prove that one has equality for a nonzero sequence 
@1,02,---,@n tf and only if there exists a constant A € R such that 


dal? =b/% for alll <k <n. (9.3) 


BUILDING ON THE PAST 


Surely one’s first thought is to try to adapt one of the many proofs 
of Cauchy’s inequality; it may even be instructive to see how some of 
these come up short. For example, when p 4 2, Schwarz’s argument is 
a nonstarter since there is no quadratic polynomial in sight. Similarly, 
the absence of a quadratic form means that one is unlikely to find an 
effective analog of Lagrange’s identity. 

This brings us to our most robust proof of Cauchy’s inequality, the 
one that starts with the so-called “humble bound,” 

bg 
ry < at + 3Y 


for all z,y ER. (9.4) 


This bound may now remind us that the general AM-GM inequality 
(2.9), page 23, implies that 


ab < % path, 8 ats 9.5 
ae eea a+p° re) 


for all x > 0, y > 0, a> 0, and @ > 0. If we then set u = x%, v = y?, 
p = (a+ B)/a, and q = (a+ B)/8, then we find for all p > 1 that one 
has the handy inference 


1 1 
—-4+-=1 = w<-uP+-v! for all u,v € RT. (9.6) 
Pp q 
This is the perfect analog of the “humble bound” (9.4). It is known as 


‘Young’s inequality, and it puts us well on the way to a solution of our 
challenge problem. 
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ANOTHER ADDITIVE TO MULTIPLICATIVE TRANSITION 


The rest of the proof of Hélder’s inequality follows a familiar pattern. 
If we make the substitutions ut> a, and v +> by in the bound (9.6) and 
sum over 1 <k <n, then we find 


D,anbe SF Dt GDh (9.7) 


and to pass from this additive bound to a multiplicative bound we can 
apply the normalization device with which we have already scored two 
successes. We can assume without loss of generality that neither of our 
sequences is identically zero, so the normalized variables 


n 1/p us n 1/q 
a= ar/ (Soar) and ix =m /( 04) 
k=1 k=1 


are well defined. Now, if we simply substitute these values into the 
additive bound (9.7), we find that easy arithmetic guides us quickly to 
the completion of the direct half of the challenge problem. 


LOOKING BACK — CONTEMPLATING CONJUGACY 


In retrospect, Riesz’s argument is straightforward, but the easy proof 
does not tell the whole story. In fact, Riesz’s formulation carried much 
of the burden, and he was particularly wise to focus our attention on the 
pairs of powers p and q such that 1/p+1/q = 1. Such (p,q) pairs are 
now said to be conjugate, and many problems depend on the trade-offs 
we face when we choose one conjugate pair over another. This balance 
is already visible in the p-q generalization (9.6) of the “humble bound” 
(9.4), but soon we will see deeper examples. 


BACKTRACKING AND THE CASE OF EQUALITY 


To complete the challenge problem, we still need to determine the cir- 
cumstances where one has equality. To begin, we first note that equality 
trivially holds if b, = 0 for all 1 < k <n, but in that case the identity 
(9.3) is satisfied \ = 0; thus, we may assume with loss of generality that 
both sequences are nonzero. 

Next, we note that equality is attained in Hélder’s inequality (9.1) if 
and only if equality holds in the additive bound (9.7) when it is applied 
to the normalized variables @;, and 6;,. By the termwise bound (9.6), we 
further see that equality holds in the additive bound (9.7) if and only if 
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n n n n n 
pe gee Pe DY al = BE{(D> Bg) 4050 a) 1/7} 
n $e x 
poe Gabe = 1 ae=b} k=1,2,...,n 


bf C=» Gnby = G2 /p + OP /qk =1,2,...,n 


Fig. 9.1. The case for equality in Hélder’s inequality is easily framed as a 
blackboard display, and such a semi-graphical presentation has several advan- 
tages over a monologue of “if and only if’ assertions. In particular, it helps 
us to see the argument at a glance, and it encourages us to question each of 
the individual inferences. 


we have 


a 1. 
andy = a ms for all k = bayer 


Next, by the condition for equality in the special AM-GM bound (9.5), 
we find that for each 1 < k < n we must have a? = bf. Finally, when we 
peel away the normalization indicated by the hats, we see that Aa? = bi 
for all 1 <k <n where \ is given explicitly by 


= (Fa) "/(Ea)” 


This is characterization that we anticipated, and the solution of the 
challenge problem is complete. 


A BLACKBOARD TOOL FOR BETTER CHECKING 


Backtracking arguments, such as the one just given, are notorious for 
harboring gaps, or even outright errors. It seems that after working 
through a direct argument, many of us are just too tempted to believe 
that nothing could go wrong when the argument is “reversed.” Unfor- 
tunately, there are times when this is wishful thinking. 

A semi-graphical “blackboard display” such as that of Figure 9.1 may 
be of help here. Many of us have found ourselves nodding passively to 
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a monologue of “if and only if’ statements, but the visible inferences 
of a blackboard display tend to provoke more active involvement. Such 
a display shows the whole argument at a glance, yet each inference is 
easily isolated. 


A CONVERSE FOR HOLDER 


In logic, everyone knows that the converse of the inference A > B 
is the inference B > A, but in the theory of inequalities the notion 
of a converse is more ambiguous. Nevertheless, there is a result that 
deserves to be called the converse Holder inequality, and it provides our 
next challenge problem. 


Problem 9.2 (The Hélder Converse — The Door to Duality) 
Show that if 1 < p< co and if C is a constant such that 


n n 1/p 
oy AnLE S cf Se jul"} (9.8) 
k=1 k=1 


for alla,, 1<k <n, then for gq = p/(p—1) one has the bound 
n 1/q 
{> jul} aC (9.9) 
k=1 


How TO UNTANGLE THE UNWANTED VARIABLES 


This problem helps to explain the inevitability of Riesz’s conjugate 
pairs (p,q), and, to some extent, the simple conclusion is surprising. 
Nonlinear constraints are notoriously awkward, and here we see that we 
have x-variables tangled up on both sides of the hypothesis (9.8). We 
need a trick if we want to eliminate them. 

One idea that sometimes works when we have free variables on both 
sides of a relation is to conspire to make the two sides as similar as 
possible. This “principle of similar sides” is necessarily vague, but here 
it may suggest that for each 1 < k < n we should choose x, such that 
apr, = |x, |?; in other words, we set zg = sign(az)|az|?/“°-) where 
sign(a,) is 1 if a, > 0 and it is —1 if a, < 0. With this choice the 
condition (9.8) becomes 


n n 1/p 
S- lay |?/@-D < of s jaxpre—n | ; (9.10) 
k=1 k=1 


We can assume without loss of generality that the sum on the right is 
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nonzero, so it is safe to divide by that sum. The relation 1/p+1/q=1 
then confirms that we have indeed proved our target bound (9.9). 


A SHORTHAND DESIGNED FOR HOLDER’S INEQUALITY 


Holder’s inequality and the duality bound (9.9) can be recast in several 
forms, but to give the nicest of these it will be useful to introduce some 
shorthand. If a = (a1,a2,...,@,) is an n-tuple of real numbers, and 
1<p<o we will write 


lalp = o ja?) (9.11) 


while for p = oo we simply set |/al]oo = Maxi<k<n |@x|. With this nota- 
tion, Hélder’s inequality (9.1) for 1 < p < co then takes on the simple 
form 


< |lallp|lbllq. 


n 
s andy 
k=l 


where for 1 < p < oo the pair (p,q) are the usual conjugates which are 
determined by the relation 


1 1 
-4+-=1 when 1 <p<o, 
P gq 


but for p = 1 we just simply set q = c. 

The quantity ||a||, is called the p-norm, or the @?-norm, of the n-tuple, 
but, to justify this name, one needs to check that the function a + |lal|, 
does indeed satisfy all of the properties required by the definition a norm; 
specifically, one needs to verify the three properties: 


(i) |lal|, = 0 if and only ifa=0, 
(ii) |]aal|p = Ja |lal|p for all a € R, and 


(iii) ||Ja+ bl|, < |lal|p + ||bl|, for all real n-tuples a and b. 


The first two properties are immediate from the definition (9.11), but 
the third property is more substantial. It is known as Minkowski’s in- 
equality, and, even though it is not difficult to prove, the result is a 
fundamental one which deserves to be framed as a challenge problem. 
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Problem 9.3 (Minkowski’s Inequality) 
Show that for each a = (a1, 42,...,@n) and b = (by, b2,...,bn) one 
has 


|a+bllp < llallp + I[bllp: (9.12) 


or, in longhand, show that for all p >1 one has the bound 
1/ 1/p 


So lag + Og? “ Se oe Se Jbl? . (9.13) 
k=1 k=1 k=1 


Moreover, show that if ||al|)p 4 0 and if p > 1, then one has equality in 
the bound (9.12) if and only if (1) there exist a constant \ © R such that 
|by| = Alaz| for all kk = 1,2,...,n, and (2) ay and by have the same sign 
for each k = 1,2,...,n. 


RIESZ’S ARGUMENT FOR MINKOWSKI’S INEQUALITY 


There are many ways to prove Minkowski’s inequality, but the method 
used by F. Riesz is a compelling favorite — especially if one is asked to 
prove Minkowski’s inequality immediately after a discussion of Holder’s 
inequality. One simply asks, “How can Holder help?” Soon thereafter, 
algebra can be our guide. 

Since we seek an upper bound which is the sum of two terms, it is 
reasonable to break our sum into two parts: 

n n n 
Slax + bel? < S- lagllax + be? +S © [della + be|?*. (9.14) 
k=1 k=1 k=1 


This decomposition already gives us Minkowski’s inequality (9.13) for 
p = 1, so we may now assume p > 1. If we then apply Holder’s inequality 
separately to each of the bounding sums (9.14), we find for the first sum 
that 


a n 1/p 7 _” (p-1)/p 
Se lan|lax + b./?-? < (Solan!) (Silex +001") 
k=1 k=1 


k=1 


while for the second we find 


me n 1/p 7 7m 
So [allan + bel? < (Sort) (Solan + ou!") 
k=1 k=1 k=1 


Thus, in our shorthand notation the factorization (9.14) gives us 


(p—1)/p 


l|a+ DIP < |lallp- |la+ bl, + [Ibllp- lla t+ bP. (9.15) 
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Since Minkowski’s inequality (9.12) is trivial when ||a+b]|, = 0, we can 
assume without loss of generality that ||a + bl], 4 0. We then divide 
both sides of the bound (9.15) by ||a+ b||2~" to complete the proof. 


A HIDDEN BENEFIT: THE CASE OF EQUALITY 


One virtue of Riesz’s method for proving Minkowski’s inequality (9.12), 
is that his argument may be worked backwards to determine the case of 
equality. Conceptually the plan is simple, but some of the details can 
seem fussy. 

To begin, we note that equality in Minkowski’s bound (9.12) implies 
equality in our first step (9.14) and that jax + bx| = |ax| + |by| for each 
1<k<n. Thus, we may assume that az, and by are of the same sign 
for alll <k <n, and in fact there is no loss of generality if we assume 
dy, >O0 and by > 0 for alll <k<n. 

Equality in Minkowski’s bound (9.12) also implies that we have equal- 
ity in both of our applications of Hélder’s inequality, so, assuming that 
||la+ b||, #0, we deduce that there exists \ > 1 such that 


Alax|? = {lax + by |P~1 44 — lax + bz |? 
and there exists \’ > 1 such that 
N|bz|? = {lax + b,|P-1 44 = lax + bz |”. 


From these identities, we see that if we set ” = /X’ then we have 
N" lax |? = lbp |? for all k = 1,2,...,n. 

This is precisely the characterization which we hoped to prove. Still, 
on principle, every backtrack argument deserves to be put to the test; 
one should prod the argument to see that it is truly airtight. This is 
perhaps best achieved with help from a semi-graphical display analogous 
to Figure 9.1. 


SUBADDITIVITY AND QUASILINEARIZATION 


Minkowski’s inequality tells us that the function h : R” — R defined 
by h(a) = |lal|p is subadditive in the sense that one has the bound 


h(a+b) <h(a)+A(b) for alla,b eR”. 


Subadditive relations are typically much more obvious than Riesz’s proof, 
and one may wonder if there is some way to see Minkowski’s inequality 
ata glance. The next challenge problem confirms this suspicion and 
throws added precision into the bargain. 


Holder’s Inequality 143 


Problem 9.4 (Quasilinearization of the /? Norm) 
Show that for all 1 < p< oo one has the identity 


ally = max | Yancy  (xllq = i}, (9.16) 


k=1 


where a = (@1,42,...,@n) and where p and q are conjugate (so one has 
q = p/(p —1) when p > 1, but gq = co when p = 1 and q = 1 when 
p=oo). Finally, explain why this identity yields Minkowski’s inequality 
without any further computation. 


QUASILINEARIZATION IN CONTEXT 


Before addressing the problem, it may be useful to add some context. 
If V is a vector space (such as R”) and if L: V x W — Ris a function 
which is additive in its first variable, L(a + b,w) = L(b,w) + L(b,w), 
then the function h: V — R, defined by 


h(a) = max L(a,w), (9.17) 


wew 


will always be subadditive simply because two choices are always at least 
as good as one: 


h(a+b) = max L(a+b,w) = max {L(a,w) + L(b,w)} 


< L Lib =h h(b). 
= max La, wo) + max b(b, wi) = hla) + 2(b) 
The formula (9.17) is said to be a quasilinear representation of h, and 
many of the most fundamental quantities in the theory of inequalities 
have analogous representations. 


CONFIRMATION OF THE IDENTITY 


The existence of a quasilinear representation (9.16) for the function 
h(a) = |lal|p is an easy consequence of Hélder’s inequality and its con- 
verse. Nevertheless, the logic is slippery, and it is useful to be explicit. 
To begin, we consider the set 


f= {your 3 s |az|? < i}, 
k=1 k=1 


and we note that Holder’s inequality implies s < |la||, for all s € S. 
This gives us our first bound, max{s € S} < |lal|,. Next, just by the 
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definition of S and by scaling we have 


ye anYk < |ly||gmax{s € S} for all y € R”. (9.18) 
k=1 
Thus, by the converse Holder bound (9.9) for the conjugate pair (q, p) 
— as opposed to the pair (p,q) in the statement of the bound (9.9) — 
we have our second bound, |lal|, < max{s € S}. The first and second 


bounds now combine to give us the quasilinear representation (9.16) for 
h(a) = |lallp- 


A STABILITY RESULT FOR HOLDER’S INEQUALITY 


In many areas of mathematics one finds both characterization results 
and stability results. A characterization result typically provides a con- 
crete characterization of the solutions of some equation, while the asso- 
ciated stability result asserts that if the equation “almost holds” then 
the characterization “almost applies.” 

There are many examples of stability results in the theory of inequal- 
ities. We have already seen that the case of equality in the AM-GM 
bound has a corresponding stability result (Exercise 2.12, page 35), and 
it is natural to ask if Holder’s inequality might also be amenable to such 
a development. 

To make this suggestion specific, we first note that the 1-trick and 
Holder’s inequality imply that for each p > 1 and for each sequence of 
nonnegative real numbers aj, a2,...,@n, one has the bound 


n n 1/p 
doa snlvir( Sas) 
j=1 


—] 


j 
If we then define the difference defect 6(a) by setting 


n n Pp 
d(a) ue s as — caoS «;) , (9.19) 
j=l j=l 
then one has d(a) > 0, but, more to the point, the criterion for equality 
in Holder’s bound now tells us that d(a) = 0 if and only if there is 
a constant 4 such that a; = ys for all 7 = 1,2,...,n. That is, the 
condition 6(a) = 0 characterizes the vector a = (a1,Q2,...,@,) as a 
constant vector. 

This characterization leads in turn to a variety of stability results, 
and our next challenge problem focuses on one of the most pleasing of 
these. It also introduces an exceptionally general technique for exploiting 
estimates of sums of squares. 
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Problem 9.5 (A Stability Result for Hélder’s Inequality) 


Show that if p > 2 and ifa; > 0 for alll <j <n, then there exists a 
constant X = X(a,p) such that 


aj € [(A—62)2/”,(X+62)2/?] for all j =1,2,...,n. (9.20) 


In other words, show that if the difference defect 6 = 6(a) is small, then 
the sequence @1,Q2,...,@n, is almost constant. 


ORIENTATION 


There are many ways to express the idea that a sequence is almost 
constant, and the specific formula (9.20) used here is just one of several 
possibilities. Nevertheless, this choice does give us a hint about how we 
might proceed. 

The relation (9.20) may be written more sensibly as (ae —)? < 6(a), 
and we can prove all of the individual bounds (9.20) in a single step if 
we can prove the stronger conjecture that there exists a constant for 
which we have the bound 


Sat? — \)? < d(a). (9.21) 


j=1 


It is possible, of course, that the inequality (9.21) asks for too much, but 
it is such a nice conjecture that it deserves some attention. 


Why Is IT NICE? 


First of all, if p = 2, then one finds by direct computation from the 
definition of 6(a) that the bound (9.21) is actually an identity, provided 
that one takes A = (a1 +a2+--:+an)/n. It is always a good sign when 
a conjecture is known to be true in some special case. 

A more subtle charm of the conjecture (9.21) is that it asks us indi- 
rectly if a certain quadratic polynomial has a real root. Namely, if the 
inequality (9.21) holds for some real A, then by continuity there must 
also exist a real \ that satisfies the equation 


n n n p 
S- (abl? — d)? = d(a) S Soa? - nay “;) 
j=l j=l 


After algebraic expansion and simplification, we therefore find that 
the conjecture (9.21) is true if and only if there is a real root of the 
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equation 
n n Pp 
nr —25° af? +n*( Jas) ==(), (9.22) 


Since a quadratic equation A\? + 2B\+C = 0 has a real root if and 
only if AC < B?, we see that the solution to the challenge problem will 
be complete if we can show 


n*( Sra) Z (soa) (9.23) 


Fortunately, it is easy to see that this bound holds; in fact, it is a just 
another corollary of Holder’s inequality and the 1-trick. To be explicit, 
one just applies Hélder’s inequality with p’ = p/2 and q' = p/(p — 2) to 
the sum a,-l+ag-1+---+ay,°1. 
INTERPOLATION 

The ¢! norm and the @° norm represent the two natural extremes 
among the @? norms, and it is reasonable to guess that in favorable 
circumstances one should be able to combine an ¢! inequality and an 
£°° inequality to get an analogous inequality for the @? norm where 
l<p<o. 

Our final challenge problem provides an important example of this 
possibility. It also points the way to one of the most pervasive themes 
in the theory of inequalities — interpolation. 


Problem 9.6 (An Illustration of /'-(° Interpolation) 
Let cjp, 1 S 7 S< m1 <k <n, be an array of nonnegative real 
numbers such that 
n 


m 
ys SS Cik&k 
j=l k=1 


for alla, 1<k<n. If 1 <p<o andgq= p/(1—p) show that one 
also has the interpolation bound 


m n P\ 1/p n 1/p 
(> Se eferk ) < AveB4( Slax) (9.24) 


j=l! k=1 k=1 


n 


s CjkUk 


k=1 


n 
< AS |x,| and max 


< B max |x| 
a= 1<j<m 1<k<n 


for allay, l1<k<n. 
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SEARCH FOR A SIMPLER FORMULATION 


The feature of the inequality (9.24) which may seem troublesome is 
the presence of the pth roots; one quickly starts to hunger for a way to 
make them disappear. The root on the right side is not a problem since 
by scaling x we can assume without loss of generality that ||x||, < 1, 
but what can we do about the pth root on the left side? 

Luckily, we have a tool that is well suited to the task. The converse 
of Holder’s inequality (page 139) tells us that to prove the bound (9.24) 
it suffices to show that, for all real vectors x and y such that ||x||, < 1 
and ||y||q <1, one has 


m n 


SS Ge Se (9.25) 


j=l k=1 


Moreover, since we assume that cj, > 0 for all 7 and k, it suffices to 
prove the bound just for ||x||, <1 and |ly||, <1 with x, > 0 andy; >0 
for all 7 and k. 

The reformulation (9.25) offers signs of real progress; in particular, 
the pth roots are gone. We now face a problem of the kind we have met 
several times before; we simply need to estimate a sum subject to some 
nonlinear constraints. 


FROM FORMULATION TO FINISH 


In the past, the splitting trick has been a great help with such bounds, 
and here it is natural to take a clue from the relation 1/p+1/q=1. By 
splitting and by Holder’s inequality we find 


S- ys CrRERYZ = S- IC Roy.) UP (cgqyf)'/2 

1/p mn 1/q 
cnt) (Sod ensf) ; (9.26) 
j=l k=1 j=l k=l 


and now we just need to estimate the last two factors. 
The first factor is easy, since our first hypothesis and the assumption 
\|x||p < 1 give us the bound 


DD cit SAS aR SA. (9.27) 


Estimation of the second is not much harder since after one crude bound 
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our second hypothesis and the assumption |ly||, < 1 give us 
q q 
DD cid} Sot pes Dc} <B) iy < iB. (9.28) 
j=lk= j= 


Finally, when we use the estimates (9.27) and (9.28) to estimate the 
product (9.26), we get our target bound (9.25) 
the solution of the first challenge problem. 


, and thus we complete 


EXERCISES 


Exercise 9.1 (Doing the Sums for Hélder) 

In Exercise 1.8 we saw that the effective use of Cauchy’s inequality 
may depend on having an estimate for one of the bounding sums and, in 
this respect, Hdlder’s inequality is a natural heir. As a warm-up, check 
that for real aj, 7 = 1,2,..., one has 


4/5 


n n / 
Qk * / a 
», {k(k+ 1)}4/5 < (SI il ‘) ’ ( ) 


eee 64a Solanl) and (b) 
ra VE | 


/3 


Sans! <( 2 (S tap?) forO<a<l. (c) 


Exercise 9.2 (An Inclusion Radius Bound) 

For a polynomial P(z) = 2" + a@n_12"- 1 +--+ + a1z +40 with real 
or complex coefficients, the smallest value r(P) such all roots of P are 
contained in the disk {z : |z| < r(P)} is called the inclusion radius for 
P. Show that for any conjugate pair p > 1 and gq = p/(p—1) > 1 one 
has the bound 


n-1 1/p 
r(P)< (1+ At) t/a where A, = 6 a!) ; (9.29) 


n=0 


Holder’s Inequality 149 


Exercise 9.3 (Cauchy Implies Hélder) 

Prove that Cauchy’s inequality implies Hélder’s inequality. More 
specifically, show that Cauchy’s inequality implies Hélder’s inequality 
for p € {8/1,8/2,8/3,...,8/6,8/7} by first showing 


n 8 n n n 
{ SS asbjeidye; fasts} < { S- ah S- us} Pee { ye ns 
j=l j=l j=l j=l 

By the same method, one can prove Hélder’s inequality for all p = 2*/j, 
1 <j < 2*. One can then call on continuity to obtain Hélder’s inequality 
forall <p<o. 

This argument serves as a reminder that an ¢?-result may sometimes 
be applied iteratively to obtain an @?-result. The inequalities one finds 
this way are often proved more elegantly by other methods, but iteration 
is still a remarkably effective tools for the discovery of new bounds. 


Exercise 9.4 (Interpolation Bound for Moment Sequences) 
If @: [0,c0) — [0,0o) is an integrable function and t € (0,00), then 
the integral 


w= f x’ d(x) dx 


is called the tth moment of ¢. Show that if t € (to, t1) then 


be < ee where t = (l—a)tg+ at, and 0<a<l. 
In other words, the linearly interpolated moment is bounded by the 
geometric interpolation of two extreme moments. 


Exercise 9.5 (Complex Hélder — and the Case of Equality) 
Holder’s inequality for real numbers implies that for complex numbers 
Q1,42,...,@4y and bi, b2,...,bn one has the bound 
1/q 


Yuh < (Solo) "(So at) (9.30) 


when p > 1 and q > 1 satisfy 1/p+1/q¢ = 1. What conditions on 
the complex numbers aj, @2,...,@n, and by, bo,...,b, are necessary and 
sufficient equality to hold in the bound (9.30)? Although this exercise is 
easy, it nevertheless offers one useful morsel of insight that should not 
be missed. 
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Exercise 9.6 (Jensen Implies Minkowski) 
By Jensen’s inequality, we know that for a convex @ and positive 
weights w 1, W2,.-..,Wn one has 


(= + wet, +++ + “tn ) 

ONT it 
2 wid(x1) + web(t2) +00 + Wnb(@n) 
~ Wy + We ++ + Wn, 


(9.31) 


Consider the concave function ¢(a) = (1 + 2!/?)? on [0,00], and show 
that by making the right choice of the weights w, and the values x, in 
Jensen’s inequality (9.31) one obtains Minkowski’s inequality. 


Exercise 9.7 (Hélder’s Inequality for Integrals) 

Naturally there are integral versions of Holder’s inequality and, in 
keeping with the more modern custom, there is no cause for a name 
change when one switches from sums to integrals. 

Let w: D — [0,00) be given, and reinforce your mastery of Holder’s 
inequality by checking that our earlier argument (page 137) also shows 
that, for all suitably integrable functions f and g from D to R, 


[ sersterme ars (finer saz) "(foe [20 (syd) 


where, as usual, 1 <p <ooand p-'+q7!=1. 


Exercise 9.8 (Legendre Transforms and Young’s Inequality) 
If f : (a,b) > R, then the function g : R — R defined by 


gly) = peey =F (e)} (9.32) 


is called the Legendre transform of f. It is used widely in the theory of 
inequalities, and part of its charm is that it helps us relate products to 
sums. For example, the definition (9.32) gives us the immediate bound 


xy < f(x) +g(y) for all (x,y) € (a,b) x R. (9.33) 


(a) Find the Legendre transform of f(x) = x? /p for p > 1 and compare 
the general bound (9.33) to Young’s inequality (9.6). 

(b) Find the Legendre transforms of f(a) = e* and d(x) = cloga—2. 

(c) Show that for any function f the Legendre transform g is convex. 
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Exercise 9.9 (Self-Generalizations of Hélder’s Inequality) 

Holder’s inequality is self-generalizing in the sense that it implies sev- 
eral apparently more general inequalities. This exercise address two of 
the most pleasing of these generalizations. 


(a) Show that for positive p, q, bigger than r one has 


1 1 1 n l/r n 1/p n 1/q 
phase 2 {deur} <{ea} oa} 
j=l 


jot = 


(b) Given p,q, and r are bigger than 1, show that if 


then one has the triple produce inequality 


n n 1/p n 1/q n 1/r 
Lates{ Da} { us} {da} 
j=l j=l j=l j=l 


Exercise 9.10 (The Historical Hélder Inequality) 


The inequality which Holder actually proved in his 1889 article asserts 
that for wz, > 0, yr => 0, and p> 1 one has 


n n (p-1)/p 7 _” 1/p 
Se WkYk < { XS un} { SS wna} . (9.34) 
k=1 k=1 


k=1 


Show, as Holder did, that this inequality follows from the weighted ver- 
sion (9.31) of Jensen’s inequality. Finally close the loop by showing that 
the historical version (9.34) of Hélder’s inequality is equivalent to the 
modern version that was introduced by F. Riesz. That is, check that 
inequality (9.34) implies inequality (9.1), and vice versa. 


Exercise 9.11 (Minkowski Implies Hélder) 

The triangle inequality implies Cauchy’s inequality, so it surely seems 
reasonable to guess that Minkowski’s inequality might also imply Holder’s 
inequality. The guess is true, but the confirmation is a bit subtle. As 
a hint, consider what Minkowski’s inequality (9.12) for @° says for the 
vectors 6(a?!*, ab!®, a0 ab!’*) and (1 — 6)(b2/*, pa? rae pa/*) when s is 


very large. 
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Wk 
pe Tet ain = = oe ajx) 


Fig. 9.2. Holder’s inequality for an array (9.35) is easier to keep in mind if one 
visualizes its meaning. In fact, it asserts a natural commutativity relationship 
between the summation operation S' and the geometric mean operation G. As 
the figure suggests, if we let G act on rows and let S act on columns, then the 
inequality (9.35) tells us that by acting first with the geometric mean G we 
get a smaller number than if we act first with S. 


Exercise 9.12 (Hélder’s Inequality for an Array) 


Any formula that generalizes Hélder’s inequality to an array is likely 
to look complicated but, as Figure 9.2 suggests, it is still possible for 
such a formula to be conceptually simple. 


Show that for nonnegative real numbers a;,, 1 <j <m,1l<k<n 


and positive weights w1,...,W, that sum to 1, we have the bound 
min n m Wk 
Ses (Lan) (9.35) 
j=l k=1 k=1 ‘j=l 


Prove this inequality, and use it to prove the mixed mean inequality 
which asserts that for nonnegative x, y, z one has 


(9.36) 


a + (xy)? + (ayz)3 y (= Stu etur ey 
3 - 2 3 , 


Exercise 9.13 (Rogers’s Inequality —— the Proto-Hélder) 


The inequality that L.C. Rogers proved in this 1888 article asserts 
that for0 <r<s<t<_o and for nonnegative az, by, k = 1,2,...,n, 
one has the bound 


—r 


n t-—r n t—s n s 
(Sati) < (Sati) (Soak) ; 
k=1 k=1 k=1 
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which we may write more succinctly as 


(Ss)'" < (Sp)°8(S:)°-" where S, = S~ axb2 for p>0. (9.37) 
k=1 
Rogers gave two proofs of his bound (9.37). In the first of these he 
called on the Cauchy—Binet formula [see (3.7), page 49], and the second 
he used the AM-GM inequality which he wrote in the form 


Wir, + W2%Q+°++ + Wnty 
Wy t+ Wet + Wn 


Wi Twat +rWn 
Wy ,W2 Wn 
Ly Ly +++ Ly < ( ) 


where the values w 1, W2,...,Wn are assumed to be positive but which 
are otherwise arbitrary. 

Now, follow in Rogers’s footsteps and use the very clever substitutions 
Wr = Gnd; and x, = bi-* to deduce the bound 


s s s\t—s 
(Ge eB) (SB) (9.38) 
and use the substitutions wy, = a,b; and x, = by * to deduce the bound 
(or ee os) “ae (iG 1G)2*, (9.39) 


Finally, show how these two relations imply Rogers’s inequality (9.37). 


Exercise 9.14 (Interpolation for Positive Matrices) 

Let 1 < s9,to,$1,t, < co be given and consider an m x n matrix T 
with nonnegative real entries cjz, 1 < 7 <m, 1 <k <n. Show that if 
there exist constants Mop and M;, such that 


I|FX||¢o < Mol|xl|so and ||Tx|l4, < Mil|xlls. (9.40) 
for all x € R™, then for each 0 < 6 < 1, one has the bound 
|Tx||e < Mo||x|| for all x € R™ (9.41) 


where Mg is defined by My = M? M;° and where s and t are given by 
1 6 1-0 1 @ 1-86 


s S1 SO ‘ t re ty to 


(9.42) 


This problem takes some time to absorb, but the result is important, 
and it pays generous interest on all invested effort. Figure 9.3 should help 
one visualize the condition (9.42) and the constraints on the parameters 
1 < 89, to, $1, ¢, < co. One might also note that the bound (9.41) would 
follow trivially from the hypotheses (9.40) if 9 = 0 or 6 = 1. Moreover, 
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(1/s0, 1/to) 


(1/so,1/to) 


(1/s1, 1/t1) 
Gis Gis) 


Fig. 9.3. The constraints 1 < so, to, s1,t1 < oo mean that the reciprocals are 
contained in the unit square S = [0,1] x [0,1], and the exponent relation 
(9.42) tells us that (1/s,1/t) is on the line from (1/s1,1/t1) to (1/so, 1/to). 
The parameter @ is then determined by the explicit interpolation formula 
(1/s,1/t) = (1/1, 1/t1) + (1 — 9)(1/s0, 1/t0). 


the bound (9.41) automatically recaptures the inequality (9.24) from 
Challenge Problem 9.6; one only needs to set t) = 1, 5; = 1, M; = A, 
to = 00, $9 = 00, Mp = B, and 6 = 1/p. 

Despite the apparent complexity of Exercise 9.14, one does not need 
to look far to find a plan for proving the interpolation formula (9.41). 
The strategy which worked for Problem 9.6 (page 146) seems likely to 
work here, even though it may put one’s skill with the splitting trick to 
the test. 

Finally, for anyone who may still be hesitant to take up the challenge 
of Exercise 9.14, there is one last appeal: first think about proving the 
more concrete inequality (9.43) given below. This inequality is typical of 
a large class of apparently tough problems which crumble quickly after 
one calls on the interpolation formula (9.41). 


Exercise 9.15 (An /? Interpolation Bound) 
Let cjpz, 1 < jf < m, 1 < k <n be an array of nonnegative real 
numbers for which one has the implication 


Xj= S cinte for all 7 =1,2,...,.m => S- Xj)? < ye |x|. 
k=1 j=l k=1 


Show that for all 1 < p < 2 one then has the bound 
m 1/q n 1/p 
(21351) z HEELS. eu!" (9.43) 
j=l k=1 


where and g = p/(p— 1) and M = max |c;jx|. 
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Hilbert’s Inequality 
and Compensating Difficulties 


Some of the most satisfying experiences in problem solving take place 
when one starts out on a natural path and then bumps into an unex- 
pected difficulty. On occasion this deeper view of the problem forces us 
to look for an entirely new approach. Perhaps more often we only need 
to find a way to press harder on an appropriate variation of the original 
plan. 

This chapter’s introductory problem provides an instructive case; here 
we will discover two difficulties. Nevertheless, we manage to achieve our 
goal by pitting one difficulty against the other. 


Problem 10.1 (Hilbert’s Inequality) 


Show that there is a constant C such that for every pair of sequences 
of real numbers {a,} and {b,} one has 


CO. 000 aap: oo : 7 ess : 1 
Lr <o La) (%) (10.1) 


SOME HISTORICAL BACKGROUND 


This famous inequality was discovered in the early 1900s by David 
Hilbert; specifically, Hilbert proved that the inequality (10.1) holds with 
C' = 27. Several years after Hilbert’s discovery, Issai Schur provided a 
new proof which showed Hilbert’s inequality actually holds with C = 7. 
We will see shortly that no smaller value of C' will suffice. 

Despite the similarities between Hilbert’s inequality and Cauchy’s in- 
equality, Hilbert’s original proof did not call on Cauchy’s inequality; he 
took an entirely different approach that exploited the evaluation of some 
cleverly chosen trigonometric integrals. Nevertheless, one can prove 
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Hilbert’s inequality through an appropriate application of Cauchy’s in- 
equality. The proof turns out to be both simple and instructive. 

If S is any countable set and {a,} and {5} are collections of real 
numbers indexed by S, then Cauchy’s inequality can be written as 


7 \ 
S\as8s < (X22) (X4) (10.2) 
ses ses ses 

This modest reformulation of Cauchy’s inequality sometimes helps us 
see the possibilities more clearly, and here, of course, one hopes that 
wise choices for S, {a,}, and {(,} will lead us from the bound (10.2) to 
the Hilbert’s inequality (10.1). 


AN OBVIOUS FIRST ATTEMPT 


If we charge ahead without too much thought, we might simply take 
the index set to be S = {(m,n):m>1,n > 1} and take a, and £, to 
be defined by the splitting 


Am bn 
J/mtn J/mtn 
By design, the products a,3, recapture the terms one finds on the 
left-hand side of Hilbert’s inequality, but the bound one obtains from 
Cauchy’s inequality (10.2) turns out to be disappointing. Specifically, it 
gives us the double sum estimate 


6 ye 


m=l1n=1 m=l1n=1 


a, = and =p, = where s = (m,n). 


(10.3) 


but, unfortunately, both of the last two factors turn out to be infinite. 

The first factor on the right side of the bound (10.3) diverges like a 
harmonic series when we sum on n, and the second factor diverges like 
a harmonic series when we sum on m. Thus, in itself, inequality (10.3) 
is virtually worthless. Nevertheless, if we look more deeply, we soon 
find that the complementary nature of these failings points the way to 
a wiser choice of {as} and {5}. 


EXPLOITING COMPENSATING DIFFICULTIES 


The two sums on the right-hand side of the naive bound (10.3) diverge, 
but the good news is that they diverge for different reasons. In a sense, 
the first factor diverges because 

Am 


Ym+tn 


as = 
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is too big as a function of n, whereas the second factor diverges because 
bn 
Vvm+n 


is too big as a function of m. All told, this suggests that we might 
improve on a, and (, if we multiply a, by a decreasing function of n 


B.= 


and multiply 6, by a decreasing function of m. Since we want to preserve 
the basic property that 
Amn 


m+n’ 


as Bs = 


we may not need long to hit on the idea of introducing a parametric 
family of candidates such as 


A r bn r 


where s = (m,n) and where > 0 is a constant that can be chosen 
later. This new family of candidates turns out to lead us quickly to the 
proof of Hilbert’s inequality. 


EXECUTION OF THE PLAN 


When we apply Cauchy’s inequality (10.2) to the pair (10.4), we find 


(oy iz yVa(2 p> 2 


m=l1n=1 m=l1n=1 n=1m=1 


so, when we consider the first factor on the right-hand side we see 


ie) az a 2X oo oo 1 in 2X 
m pabs = 2 sak, ; 
Vas) -ho’Da(F) 
ln=1 m=1 n=1 
By the symmetry of the summands a,,b,/(m-+n) in our target sum, we 


now see that the proof of Hilbert’s inequality will be complete if we can 
show that for some choice of A there is a constant B), < oo such that 


Co 


m= 


Co 


2X 
S_ : (=) <By,  forallm>1. (10.5) 
= mt+tn\n 


Now we just need to estimate the sum (10.5), and we first recall that 
for any nonnegative decreasing function f : [0,co) — R, we have the 


integral bound 
< | f(x) dx 
0 
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In the specific case of f(x) = m?x?4(m + x)~1, we therefore find 


a. A PRs ie a ey 7 
S- (=) < [ —— ax = f ——~ —, dy, (10.6) 
<~m+n\n 9 mM+unu o (l+y)y 


where the last equality comes from the change of variables x = my. The 
integral on the right side of the inequality (10.6) is clearly convergent 
when satisfies 0 < 4 < 1/2 and, by our earlier observation (10.5), the 
existence of any such ’ would suffice to complete the proof of Hilbert’s 
inequality (10.1). 


SEIZING AN OPPORTUNITY 


Our problem has been solved as stated, but we would be derelict in 
our duties if we did not take a moment to find the value of the constant 
C that is provided by our proof. When we look over our argument, we 
actually find that we have proved that Hilbert’s inequality (10.1) must 
hold for any C = Cy with 


es ae a 
C =| ____ dy for 0 <A <1/2. 10.7 
x Jo Ly) / On 


Naturally, we should find the value of \ that provides the smallest of 
these. 

By a quick and lazy consultation of Mathematica or Maple, we discover 
that we are in luck. The integral for C) turns out to both simple and 
explicit: 


a | 1 T 
dy = fi 1/2. 10. 

i. (1+ y) y ¥~ Sin QnA Orel 108) 
Now, since sin 27\ is maximized when A = 1/4, we see that the smallest 
value attained by Cy with 0 < A < 1/2 is equal to 


C=C = as! =T7 (10.9) 
rae Ga” : 
Quite remarkably, our direct assault on Hilbert’s inequality has almost 
effortlessly provided the sharp constant C' = 7 that was discovered by 
Schur. 

This is a fine achievement for Cauchy’s inequality, but it should not 
be oversold. Many proofs of Hilbert’s inequality are now available, and 
some of these are quite brief. Nevertheless, for the connoisseur of tech- 
niques for exploiting Cauchy’s inequality, this proof of Hilbert’s inequal- 
ity is a sweet victory. 

Finally, there is a small point that we should note in passing. The 
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integral (10.8) is actually a textbook classic; both Bak and Newman 
(1997) and Cartan (1995) use it to illustrate the standard technique for 
integrating R(x)/ax® over [0,co) where R(x) is a rational function and 
0<a<1. This integral also has a connection to a noteworthy gamma 
function identity that is described in Exercise 10.8. 


OF MIRACLES AND CONVERSES 


For a Cauchy-Schwarz argument to be precise enough to show that 
one can take C = x in Hilbert’s inequality may seem to require a miracle, 
but there is another way of looking at the relation between the two sides 
of Hilbert’s inequality that makes it clear that no miracle was required. 
With the right point of view, one can see that both 7 and the special 
integrals (10.8) have an inevitable role. To develop this connection, we 
will take on the challenge of proving a converse to our first problem. 


Problem 10.2 Suppose that the constant C’ satisfies 


[o-omte <} lee) 1 oo a, 
Sep ys Soe < c( 3 «,) ; (> 2) ; (10.10) 
m=1n=1 m=1 n=1 


for all pairs of sequences of real numbers {an} and {bn}. Show that 
C>n. 


If we plug any pair of sequences {a,} and {b,} into the inequality 
(10.10) we will get some lower bound on c, but we will not get too 
far with this process unless we find some systematic way to guide our 
choices. What we would really like is a parametric family of pairs {a,,(€) } 
and {b,(¢)} that provide us with a sequence of lower bounds on C’ that 
approach 7 as « — 0. This surely sounds good, but how do we find 
appropriate candidates for {a,(¢)} and {b,,(€)}? 


STRESS TESTING AN INEQUALITY 


Two basic ideas can help us narrow our search. First, we need to be 
able to calculate (or estimate) the sums that appear in the inequality 
(10.10). We cannot do many sums, so this definitely limits our search. 
The second idea is more subtle; we need to put the inequality under 
stress. This general notion has many possible interpretations, but here it 
at least suggests that we should look for sequences {a,,(¢)} and {b,(€)} 
such that all the quantities in the inequality (10.10) tend to infinity 
as € > 0. This particular strategy for stressing the inequality (10.10) 
may not seem too compelling when one faces it for the first time, but 
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experience with even a few examples is enough to convince most people 
that the principle contains more than a drop of wisdom. 

Without a doubt, the most natural candidates for {a,,(e€)} and {b,(€)} 
are given by the identical twins 

an (€) = bp (€) = ee 

For this choice, one may easily work out the estimates that are needed 
to understand the right-hand side of Hilbert’s inequality. Specifically, 
we see that as € — 0 we have 


co i oo aE lee) oo 
2 2 2 2 1 / dx 1 
Sob = sew] ae =a 0 
(a0) a n(€) a lbee 1 alte ~ 2¢ ( ) 
CLOSING THE LOOP 


To complete the solution of Problem 10.2, we only need to show 
that the corresponding sum for the left-hand side of Hilbert’s inequality 
(10.10) is asymptotic to 7/2€ as « — 0. This is indeed the case, and the 
computation is instructive. We lay out the result as a lemma. 


Double Sum Lemma. 


1 T 
oh ~ as €— 0. 
eee 2€ 


For the proof, we first note that integral comparisons tell us that it 
suffices to show 


1 
=|" fa dady~— ase 0, 
cee 2€ 


and the change of variables u = y/z also tells us that 


ad 2 1 du 
T(e =} ely u27¢ | ax 10.12 
(€) : ia ( ) 


This integral would be easy to calculate if we could replace the lower 
limit 1/z of the inside integral by 0, and, to estimate how much damage 
such a change would cause, we first note that 


1/ax 1/ax —dte 
o< | ue ue a uo? du == Z , 
0 1l+u 0 


When we use this bound in equation (10.12) and write the result using 
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big O notation of Landau (say, as defined on page 120), then we find 


cers un2ne “a \ar+of [ tar) 
1 0 1+u 1 


1 — 1 du 


I 


T(€) 


Finally, for « — 0, we see from our earlier experience with the integral 
(10.9) that we have 


oe ilies du sa —1 du 
uw 2 s u 2 =I: 


so the proof of the lemma is complete. 


FINDING THE CIRCLE IN HILBERT’S INEQUALITY 


Any time z appears in a problem that has no circle in sight, there is 
a certain sense of mystery. Sometimes this mystery remains without a 
satisfying resolution, but, in the case of Hilbert’s inequality, a geomet- 
ric explanation for the appearance of 7 was found in 1993 by Krysztof 
Oleszkiewicz. This discovery is a bit off of our central theme, but it does 
build on the calculations we have just completed, and it is too lovely to 
miss. 


Quarter Circle Lemma. For all m > 1, we have the bound 


Co 


3 —— ei <n. (10.13) 


n=1 


For the proof, we first note that the shaded triangle of Figure 10.1 
is similar to the triangle T determined by (0,0), (,/m,/n-—1), and 


(,/m,/n), and the area of T is simply sVm(/n —V/n-—1). Thus, one 
finds by scaling that the area A,, of the shaded triangle is given by 


An = Ce ~J/m(/n — Vn — 1). (10.14) 


Since 1/,/z is decreasing on [0, a we have 


Ot 52 I 
VE Ae ap 


so, in the end, we find 


1 m vm 
Say arene oe (10.15) 


Finally, what makes this geometric bound most interesting is that all 
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The shaded triangles 
Dean 1.2. 
explain the 7 
in Hilbert’s inequality. 


(0,0) 


Fig. 10.1. The shaded triangle is similar to the triangle determined by the 


three points (0,0), (,/m, Vn — 1), and (./m, ,/n) so we can determine its area 


by geometry. Also, the triangles T,, have disjoint interiors so the sum of their 
areas cannot exceed 71/4. These facts give us the proof of the Quarter Circle 
Lemma. 


of the shaded triangles are contained in the quarter circle. They have 
disjoint interiors, so we find that the sum of their areas is bounded by 
am/4, the area of the quarter circle with radius \/m that contains them. 


EXERCISES 


Exercise 10.1 (Guaranteed Positivity) 


Show that for any real numbers aj, a@2,...,@n, one has 
3 — >0 (10.16) 
j,k=1 j+ 
and, more generally, show that for positive A1,A2,..., An one has 
— jak — 
10.17 
oy Aj + he ( ) 


Obviously the second pete implies the first, so the bound (10.16) 
is mainly a hint which makes the link to Hilbert’s inequality. As a 
better hint, one might consider the possibility of representing 1/A, as 
an integral. 


Hilbert’s Inequality and Compensating Difficulties 163 


Exercise 10.2 (Insertion of a Fudge Factor) 

There are many ways to continue the theme of Exercise 10.1, and this 
exercise is one of the most useful. It provides a generic way to leverage 
an inequality such as Hilbert’s. 

Show that if the complex array {a;,:1<j <m,1<k <n} satisfies 


the bound 
So ajntj ye) < Mllellallylla, (10.18) 
ik 
then one also has the bound 
So ajehjnejye| < oSM|l2lollylle (10.19) 
j,k 


provided that the factors h;; have an integral representation of the form 


hin = i) fj (x) gn (ax) dx (10.20) 
D 
for which for all 7 and & one has the bounds 
\fi(x)|? dx <a? and | |gn(x)|? da < B?. (10.21) 
D D 


Exercise 10.3 (Max Version of Hilbert’s Inequality) 
Show that for every pair of sequences of real numbers {a,,} and {by} 
one has 


Dee er <4( 3a) (4%) (10.22) 


and show that 4 may not be replaced by a smaller constant. 


Exercise 10.4 (Integral Version) 
Prove the integral form of Hilbert’s inequality. That is, show that for 
any f,g:[0,co) > R, one has 


[ [2 catvcx( [ reoras) (fear) 


The discrete Hilbert inequality (10.1) can be used to prove a continuous 
version, but the strict inequality would be lost in the process. Typically, 
it is better to mimic the earlier argument rather than to apply the earlier 
result. 
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Exercise 10.5 (Homogeneous Kernel Version) 

If the function K : [0,00) x [0,00) — [0,00) has the homogeneity 
property K(Az,\y) = A~1K(a,y) for all A > 0, then for any pair of 
functions f,g : [(0,00) — R, one has 


[ Pe K(«,y)f(#)g(y) dady 


< o( [itmrar) ([ jie) 


where the constant C is given by common value of the integrals 


= a so. 1 se? K(1,y) + K(y,1) 
[ Kon aav= f Ky) ady= [ A dy. 


Exercise 10.6 (The Method of “Parameterized Parameters” ) 
For any positive weights w,, k = 1,2,...,n, Cauchy’s inequality can 
be restated as a bound on the square of a general sum, 


(tet ton) <{ Da} Dakonl, (10.23) 


k=1 
and given such a bound it is sometimes useful to note the values wz, 
k =1,2,...,n, can be regarded as free parameters. The natural question 
then becomes, “What can be done with this freedom?” Oddly enough, 
one may then benefit from introducing yet another real parameter t so 
that we can write each weight w;, as w;(t). This purely psychological 
step hopes to simplify our search for a wise choice of the wz by re- 
focusing our attention on desirable properties of the functions w;(t), 
k=1,2,...,n. 

Here we want to squeeze information out of the bound (10.23), and 
one concrete idea is to look for choices where (1) the first factor of 
the product (10.23) is bounded uniformly in t and where (2) one can 
calculate the minimum value over all t of the second factor. These may 
seem like tall orders, but they can be filled and the next three steps show 
how this plan leads to some marvelous inferences. 

(a) Show that if one takes w,(t) = t+ k?/t for k = 1,2,...,n then 
the first factor of the inequality (10.23) is bounded by 7/2 for all t > 0 
and alln = 1,2,.... 

(b) Show that for this choice we also have the identity 


n n 1 n 1 
. 2 = 2\° 22° 
mag { Doaten(} = 24 doa} d! ayn P . 


k=1 
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(c) Combine the preceding observations to conclude that 
n 4 n n 
{dra} caf Sratht Saath, (10.24) 
k=1 k=1 k=1 
This curious bound is known as Carlson’s inequality, and it has been 
known since 1934. Despite several almost arbitrary steps on the path 
to the inequality (10.24), the value 7? cannot be replaced by a smaller 
one, as one can prove by the stress testing method (page 159), though 
not without thought. 


Exercise 10.7 (Hilbert’s Inequality via the Toeplitz Method) 
Show that the elementary integral 
i pa 1 
mess t— int dt per, 
2m Jo Cs in? 
for n # 0, implies that for real az, by, 1 < k < N one has the integral 
representation 


27 


1 . ikt . ikt SS dm bn 
[=5 : (t—7) So age dee ap Diecemey 


k=1 m=l1n=1 
then show that this representation and Schwarz’s inequality yield a quick 
and easy proof of Hilbert’s inequality. 


Exercise 10.8 (Functional Equation for the Gamma Function) 
Recall that the gamma function is defined by the integral 


T(A) =} a*le-* da, 
0 
and use an integral representation for 1/(1 + y) to show that 


= 1 1 
— — dy =T (2A) (1 — 2A for0 <A < 1/2. 10.25 
fap eterna 29 /2. (10.25) 
As a consequence, one finds that the evaluation of the integral (10.8) 
yields the famous functional equation for the Gamma function, 

T 


11 
Hardy’s Inequality and the Flop 


The flop is a simple algebraic manipulation, but many who master it 
feel that they are forever changed. This is not to say that the flop 
is particularly miraculous; in fact, it is perfectly ordinary. What may 
distinguish the flop among mathematical techniques is that it works at 
two levels: it is tactical in that it is just a step in an argument, and it 
is strategic in that it suggests general plans which can have a variety of 
twists and turns. 

To illustrate the flop, we call on a concrete challenge problem of in- 
dependent interest. This time the immediate challenge is to prove an 
inequality of G.H. Hardy which he discovered while looking for a new 
proof of the famous inequality of Hilbert that anchored the preceding 
chapter. Hardy’s inequality is now widely used in both pure and applied 
mathematics, and many would consider it to be equal in importance to 
Hilbert’s inequality. 


Problem 11.1 (Hardy’s Inequality) 
Show that every integrable function f : (0,T) — R satisfies the in- 


equality 
T 1 x 2 T 
— u)dup dx (x) dx : 
[Ef sah area fre (11.1) 


and show, moreover, that the constant 4 cannot be replaced with any 
smaller value. 


To familiarize this inequality, one should note that it provides a con- 
crete interpretation of the general idea that the average of a function 
typically behaves as well (or at least not much worse) than the function 
itself. Here we see that the square integral of the average is never more 
than four times the square integral of the original. 
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To deepen our understanding of the bound (11.1), we might also see 
if we can confirm that the constant 4 is actually the best one can do. 
One natural idea is to try the stress testing method (page 159) which 
helped us before. Here the test function that seems to occur first to 
almost everyone is simply the power map z+ x*. When we substitute 
this function into an inequality of the form 


LE [real ase [Peas (11.2) 


we see that it implies 


1 C 


Ss for all a such that 2a+1> 0. 
@pit@e) — Gat, “oa Coreen lee 


Now, by letting a — —1/2, we see that for the bound (11.2) to hold in 
general one must have C' > 4. Thus, we have another pleasing victory for 
the stress testing technique. Knowing that a bound cannot be improved 
always adds some extra zest to the search for a proof. 


INTEGRATION BY PARTS — AND ON SPECULATION 


Any time we work with an integral we must keep in mind the many 
alternative forms that it can take after a change of variables or other 
transformation. Here we want to bound the integral of a product of two 
functions, so integration by parts naturally suggests itself, especially 
after the integral is rewritten as 


=f { [sere Seem [Ff senaah (2) ae 


There is no way to know a priori if an integration by parts will provide 
us with a more convenient formulation of our problem, but there is also 
no harm in trying, so, for the moment, we simply compute 


rao ff f° rau} se) * de ff 10 Jaw) 7 (11.3) 


Now, to simplify the last expression, we first note that we may assume 
that f is square integrable, or else our target inequality (11.1) is trivially 
true. Also, we note that for any square integrable f, Schwarz’s inequality 
and the 1-trick tell us that for any 7 > 0 we have 


Lf senaa| ceil [" Peyan)? = ote epee, 
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so our integration by parts formula (11.3) may be simplified to 


raf [reach seodae- Ff [ serand 


This form of the integral J may not look any more convenient than the 
original representation, but it does suggest a bold action. The last term 
is nonpositive, so we can simply discard it from the identity to get 


[Ef teraad ae s2 [PLE [ reaau) f(a)da. (11.4) 


We now face a bottom line question: Is this new bound (11.4) strong 
enough to imply our target inequality (11.1)? The answer turns out to 
be both quick and instructive. 


APPLICATION OF THE FLOP 


If we introduce functions y and w by setting 


ao) == f° Flu)du and va) = f(a), (11.5) 
0 


then the new inequality (11.4) can be written crisply as 


Te oR 
[ ee@acscf o@v(ear, (11.6) 
0 (0) 


where C = 2. The critical feature of this inequality is that the function 
y is raised to a higher power on the left side of the equation than on 
the right. This is far from a minor detail; it opens up the possibility of 
a maneuver which has featured in thousands of investigations. 

The key observation is that by applying Schwarz’s inequality to the 
right-hand side of the inequality (11.6), we find 


ie g(x) de < cf i ¢*(2) al { [oe a} (11.7) 


so, if y(x) is not identically zero, we can divide both sides of this in- 


equality by 
2 2 
{| y*(z) ac} #0. 
0 
This division gives us 


{ [" seyae\’ <c{ [vee aed 
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and, when we square this inequality and replace C, y, and w with their 
defining values (11.5), we see that the “postflop” inequality (11.8) is 
exactly the same as the target inequality (11.1) which we hoped to prove. 


A DISCRETE ANALOG 


One can always ask if a given result for real or complex functions 
has an analog for finite or infinite sequences, and the answer is often 
routine. Nevertheless, there are also times when one meets unexpected 
difficulties that lead to new insight. We will face just such a situation 
in our second challenge problem. 


Problem 11.2 (The Discrete Hardy Inequality) 
Show that for any sequence of nonnegative real numbers a, d2,...,4N 
one has the inequality 


N N 


So {Fa tart---tan)} <4 a2. (11.9) 


n=1 =1 


Surely the most natural way to approach this problem is to mimic the 
method we used for the first challenge problem. Moreover, our earlier 
experience also provides mileposts that can help us measure our progress. 
In particular, it is reasonable to guess that to prove the inequality (11.9) 
by an application of a flop, then we might do well to look for a “preflop” 
inequality of the form 


N 


n=1 
which is the natural analog of our earlier preflop bound (11.4). 


FOLLOWING THE NATURAL PLAN 


Summation by parts is the natural analog of integration by parts, 
although it is a bit less mechanical. Here, for example, we must decide 
how to represent 1/n? as a difference; after all, we can either write 

1 ae 
=z = $n — Sn41 where s, = S- Rp 


mr 
k=n 


or, alternatively, we can look at the initial sum and write 


3 1 
<a Sn = Sn—1 where Sn = ke" 
k=1 
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The only universal basis for a sound choice is experimentation, so, for 
the moment, we simply take the first option. 
Now, if we let Ty denote the sum on the left-hand side of the target 
inequality (11.9), then we have 
N 
Ty = So (sn — 8p41)(@1 + ag +++ +a,)?, 
n=1 


so, by distributing the sums and shifting the indices, we have 


N N41 
Ty = S 7 8n(a1 + a2 + +++ + Gn)? — y Sn (a1 bag +++: + ani)’. 
n=1 n=2 


When we bring the sums back together, we see that Ty equals 


N 
8107 —8n41(a1 +24: -+4n)?+>~ 8n{2(a1 + a2 +++ +an—1)an+a2 } 


n=2 


and, since sy41(a, + a2 +---+4a,)? > 0, we at last find 


N N 

1 2 
y {—(a+a2++--+an)} <2 y { $n (aj+ag+---+an)}an. (11.11) 
n=1 n=1 


This bound looks much like out target preflop inequality (11.10), but 
there is a small problem: on the right side we have s,, where we hoped 
to have 1/n. Since sn, = 1/n+O(1/n?), we seem to have made progress, 
but the prize (11.10) is not in our hands. 


So NEAR ... YET 


One natural way to try to bring our plan to its logical conclusion 
is simply to replace the sum s,, in the inequality (11.11) by an honest 
upper bound. The most systematic way to estimate s, is by integral 
comparison, but there is also an instructive telescoping argument that 
gives an equivalent result. The key observation is that for n > 2 we have 


sa =e 1 
n= gS) Rea 


1 1 1 2 
— { \- < ; 
» k-1 ek n-1 n 


=n 


and, since s; = 1+ s9 <1+1/(2—1) = 2, we see that 


= 
Soa <=  foralln>1. (11.12) 
nr 
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Now, when we use this bound in our summation by parts inequality 
(11.11), we find 


N N 
3 fMertast-tanf <> {astort-tan)fon (19) 


and this is almost the inequality (11.10) that we wanted to prove. The 
only difference is that the constant 2 in the preflop inequality (11.10) has 
been replaced by a 4. Unfortunately, this difference is enough to keep 
us from our ultimate goal. When we apply the flop to the inequality 
(11.13), we fail to get the constant that is required in our challenge 
problem; we get an 8 where a 4 is needed. 


TAKING THE FLOP AS OUR GUIDE 


Once again, the obvious plan has come up short, and we must look 
for some way to improve our argument. Certainly we can sharpen our 
estimate for s,,, but, before worrying about small analytic details, we 
should look at the structure of our plan. We used summation by parts 
because we hoped to replicate a successful argument that used integra- 
tion by parts, but the most fundamental component of our argument 
simply calls on us to prove the preflop inequality 


N 


» {o(a:tart---tan)} s 25 {F(a $an+-+-ba) Sa, a) 


There is no law that says that we must prove this inequality by starting 
with the left-hand side and using summation by parts. If we stay flexible, 
perhaps we can find a fresh approach. 


FLEXIBLE AND HOPEFUL 


To begin our fresh approach, we may as well work toward a clearer 
view of our problem; certainly some of the clutter may be removed by 
setting A, = (a; + d2+---+a,)/n. Also, if we consider the term-by- 
term differences A, between the summands in the preflop inequality 
(11.14), then we have the simple identity A,, = A2 —2A,,a,. The proof 
of the preflop inequality (11.14) therefore comes down to showing that 
the sum of the increments A, over 1 <n < N is bounded by zero. 

We now have a concrete goal — but not much else. Still, we may 
recall that one of the few ways we have to simplify sums is by telescop- 
ing. Thus, even though no telescoping sums are presently in sight, we 
might want to explore the algebra of the difference A, while keeping 
the possibility of telescoping in mind. If we now try to write A,, just in 
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terms of A, and A,_1, then we have 


A, = A? — 2AnGn 
= A? — 2A, (nAn —(n- 1)An—1) 
= (1 —2n)A?2 + 2(n —1)AnAn-i, 


but unfortunately the product A,A,_1 emerges as a new trouble spot. 
Nevertheless, we can eliminate this product if we recall the “humble 
bound” and note that if we replace A, An—1 by (A? + A2_,)/2 we have 


An < (1 — 2n) A, + (n—1)(AR + AZ_1) 
= (n-1)A2_, —nA?. 


n—-1 


After a few dark moments, we now find that we are the beneficiaries of 
some good luck: the last inequality is one that telescopes beautifully. 
When we sum over n, we find 


N N 
se eee ee 
n=1 n=1 


and, by the negativity of the last term, the proof of the preflop inequality 
(11.14) is complete. Finally, we know already that the flop will take 
us from the inequality (11.14) to the inequality (11.9) of our challenge 
problem, so the solution of the problem is also complete. 


A BrieF Look BACK 


Familiarity with the flop gives one access to a rich class of strategies for 
proving inequalities for integrals and for sums. In our second challenge 
problem, we made some headway through imitation of the strategy that 
worked in the continuous case, but definitive progress only came when 
we focused squarely on the flop and when we worked toward a direct 
proof of the preflop inequality 


ae cay fh ) 

—(a1 + a2 +++-+4n <2 {Fla ante bay ban 
n=1 me n=1 Be 

The new focus was a fortunate one, and we found that the preflop in- 
equality could be obtained by a pleasing telescoping argument that used 
little more than the bound ry < (a? + y?)/2. 

In the first two examples the flop was achieved with help from Cauchy’s 
inequality or Schwarz inequality, but the basic idea is obviously quite 
general. In the next problem (and in several of the exercises) we will see 
that Hélder’s inequality is perhaps the flop’s more natural partner. 
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CARLESON’S INEQUALITY — WITH CARLEMAN’S AS A COROLLARY 


Our next challenge problem presents itself with no flop in sight; there 
is not even a product to be seen. Nevertheless, one soon discovers that 
the product — and the flop — are not far away. 


Problem 11.3 (Carleson’s Convexity Inequality) 
Show that if y : [0,co) > R is convex and y(0) = 0, then for all 
—l1<a<o one has the integral bound 


es ie ae exp( = 20) dx < et! i eee WG) de (1s) 


where, as usual, e = 2.71828... is the natural base. 


The shape of the inequality (11.15) is uncharacteristic of any we have 
met before, so one may be at a loss for a reasonable plan. To be sure, 
convexity always gives us something useful; in particular, convexity pro- 
vides an estimate of the shift difference y(y + t) — y(y). Unfortunately 
this estimate does not seem to help us much here. 

The way Carleson cut the Gordian knot was to consider instead the 
scale shift difference y(py) — y(y) where p > 1 is a parameter that we 
can optimize later. This is a clever idea, yet conceived, it easily becomes 
a part of our permanent toolkit. 


A FLop oF A DIFFERENT FLAVOR 


Carleson set up his estimation of the integral J by first making the 
change of variables x + py and then using the convexity estimate, 


y(py) = v(y) + (p— 1)yg'(y), (11.16) 


which is illustrated in Figure 11.1. The exponential of this sum gives us 

a product, so Holder’s inequality and the flop are almost ready to act. 
Still, some care is needed to avoid integrals which may be divergent, 

so we first restrict our attention to a finite interval [0, A] to note that 


a A/p 
In= | vvexp( - 221) de = pv | yexp( - 22) ay 
? sa 0 py 
A _ _ = ! 
< eee ° ex( gly) — (p — lye w) és 
0 


PY 


where in the second step we used the convexity bound (11.16) and ex- 
tended the range of integration from [0, A/p] to [0, A]. If we introduce 
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Carleson’s trick: 
one considers the 
scale shifted differences, 


p(py) — vy). 


Fig. 11.1. The convexity bound y(py) > ¢(y) + (p— 1)yy'(y) for p > 1 tells 
us how y changes under a scale shift. It also cooperates wonderfully with 
changes of variables, Hdlder’s inequality, and the flop. 


the conjugate q = p/(p—1) and apply Holder’s inequality to the natural 
splitting suggested by 1/p + 1/q = 1, we then find 


rare hae [ {ui exp (- ow) Myo exp (- a 2s) \ dy 
< rye fe y* exp (- eW) ay} 


Since [4 < oo, we may divide by vf ” to complete the flop. Upon taking 


the qth power of the resulting inequality, we find 


A A 
In = if y° exp (- eet) dy < feet y® exp (-#w) dy, 
0 (0) 


and this is actually more than we need. 


To obtain the stated form (11.15) of Carleson’s inequality, we first let 
A— oo and then let p > 1. The familiar relation log(1 + €) = + O(c?) 
implies that p?/(-)) — e as p > 1, so the solution of the challenge 
problem is complete. 


AN INFORMATIVE CHOICE OF y 


Part of the charm of Carleson’s inequality is that it provides a sly 
generalization of the famous Carleman’s inequality, which we have met 
twice before (pages 27 and 128). In fact, one only needs to make a wise 
choice of yp. 

Given the hint of this possibility and a little time for experimentation, 
one is quite likely to hit on the candidate suggested by Figure 11.2. For 


Hardy’s Inequality and the Flop 175 


A simple, but useful, observation: 


the slope y(x)/a of the chord 
increases with x. 


Fig. 11.2. If y = y(a) is the curve given by the linear interpolation of the 
points (n, s(n)) where s(n) = log(1/a1) + log(1/az2) +---+log(1/an), then on 
the interval (n—1,n) we have y’(x) = log(1/a,). If we assume that an > an41 
then y’(x) is non-decreasing and y(x) is convex. Also, since y(0) = 0, the 
chord slope y(x)/zx is monotone increasing. 


the function y defined there, we have identity 
| exp(—y’ (x)) dv = ay (11.17) 
n-1 


and, since y(x)/a is nondecreasing, we also have the bound 


(II) ‘“ EXP. (=) < [ (2) dz. (11.18) 


When we sum the relations (11.17) and (11.18), we then find by invoking 
Carleson’s inequality (11.15) with a = 0 that 


5 (IIe) < [ (“) da 


n=1 


< | exp(—y' (x)) dx = ey An- 
0 n=1 


Thus we recover Carleman’s inequality under the added assumption that 
a, > ag > a3-:-. Moreover, this assumption incurs no loss of generality, 
as one easily confirms in Exercise 11.7. 


EXERCISES 


Exercise 11.1 (The L? Flop and a General Principle) 
Suppose that 1 < a < @ and suppose that the bounded nonnegative 
functions y and w satisfy the inequality 


T T 
| py? (x) dx < cf p?(a)v(ax) dx. (11.19) 
0 0 


Show that one can “clear y to the left” in the sense that one has 


T T 
i (a) de < C8/(8-) ii yBM8-) (a) der (11.20) 
0 0 
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The bound (11.20) is just one example of a general (but vague) principle: 
If we have a factor on both sides of an equation and if it appears to a 
smaller power on the “right” than on the “left,” then we can clear the 
factor to the left to obtain a new — and potentially useful — bound. 


Exercise 11.2 (Rudimentary Example of a General Principle) 
The principle of Exercise 11.1 can be illustrated with the simplest of 
tools. For example, show for nonnegative x and y that 


2x? < y? + y?a + yx implies x? < 2y°. 


Exercise 11.3 (An Exam-Time Discovery of F. Riesz) 
Show that there is a constant A (not depending on u and v) such that 
for each pair of functions u and v on [—7,7] for which one has 


a v'(0) dd < ig u'(0) dO + 6[ u?(0)v"(0) dO, (11.21) 


one also has the bound 
/ v'\(0)d9< A] us(0) dé. (11.22) 


According to J.E. Littlewood (1988, p. 194), F. Riesz was trying to 
set an examination problem when he observed almost by accident that 
the bound (11.21) holds for the real wu and imaginary v parts of f(e’’) 
when f(z) is a continuous function that is analytic in the unit disk. This 
observation and the inference (11.22) subsequently put Riesz on the trail 
of some of his most important discoveries. 


Exercise 11.4 (The L? Norm of the Average) 
Show that if f : [0,co) — R* is integrable and p > 1, then one has 


ie Ef f(u) du\" dx < (yf fP (a) de. (11.23) 


Exercise 11.5 (Hardy and the Qualitative Version of Hilbert) 
Use the discrete version (11.9) of Hardy’s inequality to prove that 


foe) foe) foe) 
Ana 
S= a2 <oo implies that aa sconverges. 
2d n p 5 S aang g 


n=l1n=1 


This was the qualitative version of Hilbert’s inequality that Hardy had 
in mind when he first considered the Problems 11.1 and 11.2. 
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Exercise 11.6 (Optimality? — It Depends on Context) 

Many inequalities which cannot be improved in general will never- 
theless permit improvements under special circumstances. An elegant 
illustration of this possibility was given in a 1991 American Mathemati- 
cal Monthly problem posed by Walther Janous. Readers were challenged 
to prove that for all 0 < «<1 and all N > 1, one has the bound 


eee 


J 


) < (4log2)(L+a?+at+---+a?%~?), 
j=l 
(a) Prove that a direct application of Hardy’s inequality provides a 
similar bound where 4 log 2 is replaced by 4. Since log 2 = 0.693..., we 
then see that Janous’s bound beats Hardy’s in this particular instance. 
(b) Prove Janous’s inequality and show that one cannot replace 4 log 2 
with a constant C' < 4log 2. 


Exercise 11.7 (Confirmation of the Obvious) 
Show that if a; > ag > a3--- and if bj, bo, b3,... is any rearrangement 
of the sequence aj, a2,a3,..., then for each N = 1,2,... one has 


1/n 


Thus, in the proof of Carleman’s inequality, one can assume without 
lose of generality that a, > az > a3:-- since a rearrangement does not 
change the right side. 


Exercise 11.8 (Kronecker’s Lemma) 


Prove that for any sequence a1, a2,... of real or complex numbers one 
has the inference 
co 
S- — converges => Jim (a1 +ag+-+:+an)/n=0. (11.25) 
n=1 


Like Hardy’s inequality, this result tells us how to convert one type 
of information about averages to another type of information. This 
implication is particularly useful in probability theory where it is used 
to draw a connection between the convergence of certain random sums 
and the famous law of large numbers. 


12 
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The kth elementary symmetric function of the n variables x1, £2,...,%n 
is the polynomial defined the formula 


€n(@1,%2,---,Un) = ) Ui, Vig ++ Lig. 
1<i1 <ig<- <i, <n 


The first three of these polynomials are simply 
€0(41,02,---,%n) = 1, e1(21,%2,...,%n) = 21 +424+...+2n, 
and e9(a#1,%2,..-,2n) = Ss LjLk, 
1<j<k<n 
while the nth elementary symmetric function is simply the full product 
Cn (1, 02,-.-,0n) = 21LQ++ Ly. 


These functions are used in virtually every part of the mathematical 
sciences, yet they draw much of their importance from the connection 
they provide between the coefficients of a polynomial and functions of 
its roots. To be explicit, if the polynomial P(t) is written as the product 
P(t) = (t — 21)(t — 2)--- (t — Zp), then it also has the representation 


P(t) = t®—e1(x)t?*+---+(—-1)Fex (xt? *+---+(-1)"en(x), (12.1) 
where for brevity we have written e,(x) in place of e,(x1,22,...,2n). 


THE CLASSICAL INEQUALITIES OF NEWTON AND MACLAURIN 


The elementary polynomials have many connections with the theory 
of inequalities. Two of the most famous of these date back to the great 
Isaac Newton (1642-1727) and the Scottish prodigy Colin Maclaurin 
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(1696-1746). Their namesake inequalities are best expressed in terms of 
the averages 


€4 (£1, 2,---,Ln) 


(i) 


Ex. (x) _ Ex(21, £2; SBA ;2p) _ 


which bring us to our first challenge problem. 


Problem 12.1 (Inequalities of Newton and Maclaurin) 
Show that for allx € R” one has Newton’s inequalities 


Ex-1(x) + Exsi(x) < EZ(x) = forO0<k<n (12.2) 
and check that they imply Maclaurin’s inequalities which assert that 
Byl"(x) < En) (x) < ++ < Ba(x)"? < Ex(x) (12.3) 


for all x = (x1, %2,...,%n) such that x, >0 for alll <k<n. 


ORIENTATION AND THE AM-GM CONNECTION 


If we take n = 3 and set x = (2,y,z), then Maclaurin’s inequalities 
simply say 


(xyz)3 < (= + 22+ #) oe 
3 3 
which is a sly refinement of the AM-GM inequality. In the general case, 
Maclaurin’s inequalities insert a whole line of ever increasing expressions 
between the geometric mean (a122---%,)!/" and the arithmetic mean 
(a1 + @g +--+ + 4n)/n. 


FROM NEWTON TO MACLAURIN BY GEOMETRY 


For a vector x € R” with only nonnegative coordinates, the values 
{Ex,(x) : 0<k <n} are also nonnegative, so we can take logarithms of 
Newton’s inequalities to deduce that 


log E,~1(x) + log Ex+41(x) 
2 

for all 1 < k& <n. In particular, we see for x € [0,co)” that Newton’s 
inequalities are equivalent to the assertion that the piecewise linear curve 

determined by the point set {(k, log E,(x)): 0< k < n} is concave. 
If Ly denotes the line determined by the points (0,0) = (0, log E\(x)) 
and (k,log E;,(x)), then as Figure 12.1 suggests, the slope of Dy4, is 
never larger than the slope of DL; for any k = 1,2,...,n—1. Since the 


< log Ex,(x) (12.4) 
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yr = log Ex(x) 


; The inequalities of Maclaurin 
ae call on the observation that 

the successive chords from the 
origin have nonincreasing slopes. 


Fig. 12.1. If E,(x) > 0 for all 1 < k < n, then Newton’s inequalities are 
equivalent to the assertion that the piecewise linear curve determined by the 
points (k, yz), 1 < k <n, is concave. Maclaurin’s inequalities capitalize on 
just one part of this geometry. 


slope of Ly, is log E;,(x)/k, we find log E;,(x)/k < log Ex4i(x)/(k + 1), 
and this is precisely the kth of Maclaurin’s inequalities. 

The real challenge is to prove Newton’s inequalities. As one might ex- 
pect for a result that is both ancient and fundamental, there are many 
possible approaches. Most of these depend on calculus in one way or an- 
other, but Newton never published a proof of his namesake inequalities, 


so we do not know if his argument relied on his “method of fluxions.” 


POLYNOMIALS AND THEIR DERIVATIVES 


Even if Newton took a different path, it does make sense to ask what 
the derivative P’(t) might tell us about the about the special polynomials 
Ex(a@1,02,..-,Ln), 1 <k <n. If we write the identity (12.1) in the form 


P(t) = (¢— @1)(t — £2) -+-(t— an) 


= s(n (72) Bele ay... n)e"* (12.5) 


II 
| 
a 
7, 
> 
Yor a 
3 
| 
ies. 
Sy 
& 
> 
ay 
8 
i 
8 
bo 
8 
£5 
nm 
i 
o> 
| 
an 


where in the second line we used the familiar identity 


(;) a = tor k)! = = We kW ~ (", i 
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If the values x,, k = 1,2,...,n are elements of the interval [a, b], then 
the polynomial P(t) has n real roots in [a,b], and Rolle’s theorem tells 
us that the derivative P’(x) must have n —1 real roots in [a,b]. If we 
denote these roots by {y1, y2,---,Yn—1}, then we also have the identity 


OG POA GAC ae) 


nr 
os n—1 
=>" : Ce ee ae 


k=0 


If we now equate the coefficients in our two formulas for Q(t), we find 
that for all 0 < k <n—1 we have the truly remarkable identity 


Ex(a1,02,---,Un) = Ex(yi, y2,--+;Yn-1)- (12.6) 


Way Is IT SO REMARKABLE? 


The left-hand side of the identity (12.6) is a function of the n vector 
X = (21, 22,...,2n) while the right side is a function of the n— 1 vector 
y = (Y1, Y2,---;Yn—1)- Thus, if we can prove a relation such as 


0 < F(Eo(y), Exly),---;Enaly)) for ally € [a,0]""2, 
then it follows that we also have the relation 
0 < F (Eo(x), Fi(x),..., Bn—1(x)) for all x € [a, }]”. 


That is, any inequality — or identity — which provides a relation be- 
tween the n—1 quantities Eo(y), Fi(y),...,n—1(y) and which is valid 
for all values of y € [a,b]"~! extends automatically to a corresponding 
relation for the n—1 quantities Eo(x), Fi(x),...,£n—1(x) which is valid 
for all values of x € [a, }]”. 

This presents a rare but valuable situation where to prove a relation 
for functions of n variables it suffices to prove an analogous relation for 
functions of just n — 1 variables. This observation can be used in an 
ad hoc way to produce many special identities which otherwise would 
be completely baffling, and it can also be used systematically to provide 
seamless induction proofs for results such as Newton’s inequalities. 


INDUCTION ON THE NUMBER OF VARIABLES 


Consider now the induction hypothesis H,, which asserts that 
Eyai(ti; 29, tee 1 On) Bi (21, 82; toe (Ly) S E} (21,22, tee ,n) (12.7) 


for all « € R” and all 1 < j7 < n. For n = 1 this assertion is empty, so 
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our induction argument begins with He, in which case we just need to 
prove one inequality, 


2 
Eo(@1, £2) E2(x1, 2) < E? (a1, 22) or 1X2 <(25) . (12.8) 


As we have seen a dozen times before, this holds for all real 2, and x2 
because of the trivial bound (x1 — x2)? > 0. 

Logically, we could now address the general induction step, but we 
first need a clear understanding of the underlying pattern. Thus, we 
consider the hypothesis H3 which consists of the two assertions: 


Eo(@1, 22, 03) Eo (21, 22,03) < E}(v1, 22,23), (12.9) 
Fy (a1, 22,23) E3(x1, £2, £3) < E3 (21, £2, £3). (12.10) 


Now the “remarkable identity” (12.6) springs into action. The assertion 
(12.9) says for three variables what the inequality (12.8) says for two, 
therefore (12.6) tells us that our first inequality (12.9) is true. We have 
obtained half of the hypothesis H3 virtually for free. 

To complete the proof of H3, we now only need prove the second bound 
(12.10). To make the task clear we first rewrite the bound (12.10) in 
longhand as 


2 

%+%24+ %3 cre < 41%. +21%3 + L2%3 (12.11) 
3 3 

This bound is trivial if 12273 = 0, so there is no loss of generality if we 

assume 2127273 # 0. We can then divide our bound by (2127273)? to get 


1f 1 i 1 1G (ie aes a 
at < + 
3 L122 L1X3 LQX3 9 Ly Xv x3 


which may be expanded and simplified to 


1 1 1 1 1 1 
< 

= 2 2 

L1XQ 1X3 LQXL3 ry wi) x3 


At this stage of our Master Class, this inequality is almost obvious. For a 
thematic proof, one can apply Cauchy’s inequality to the pair of vectors 
(1/a1,1/a%3,1/a2) and (1/x2,1/21,1/x3), or, a bit more generally, one 
can sum the three AM-GM bounds 
1 1f1 1 
< {a+ -} 1<j<k<3. 


LiL 2 Ly Ly, 


Thus, the proof of H3 is complete, and, moreover, we have found a 
pattern that should guide us through the general induction step. 
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A PATTERN CONFIRMED 


The general hypothesis H,, consists of n—1 inequalities which may be 
viewed in two groups. First, for x = (71,22,...,%n) we have the n — 2 
inequalities which involve only E;(x) with 0 <j <n, 


Ep_-i(x)Epai(x) < ER(x) = forl<k<n-1, (12.12) 
then we have one final inequality which involves E,,(x), 
En—2(x)En(x) < E?_ (x). (£2.13) 


In parallel with the analysis of Hz, we now see that all of the inequalities 
in the first group (12.12) follow from the induction hypothesis H,, and 
the identity (12.6). All of the inequalities of H, have come to us for 
free, except for one. 

If we write the bound (12.13) in longhand and use #; as a symbol to 
suggest that x; is omitted, then we see that it remains for us to prove 
that we have the relation 


2 
rae S- by dy reedye ty beytg ay 


1<j<k<n 

1 n 2 
215 re ae 12.14 
<{, ne £; vs} ( ) 


In parallel with our earlier experience, we note that there is no loss of 
generality in assuming 71%2:--r, # 0. After division by (x,22-+-x,)? 
and some simplification, we see that the bound (12.14) is equivalent to 


n 


a ae se (12.15) 


(5) impence j=l 


We could now stick with the pattern that worked for H3, but there is 
a more graceful way to finish which is almost staring us in the face. If 
we adopt the language of symmetric functions, the target bound (12.15) 
may be written more systematically as 


Eo(1/a1,1/a2,. 2, 1/&n)Fo(1/a1,1/r2,...,1/2n) 
& BF / wis 1 fees. iy 1 en), 


and one now sees that this inequality is covered by the first bound of 
the group (12.12). Thus, the proof of Newton’s inequalities is complete. 
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EQUALITY IN THE BOUNDS OF NEWTON OR MACLAURIN 


From Figure 12.1, we see that we have equality in the kth Maclaurin 
bound yr4i/(k +1) < yx/k if and only if the dotted and the dashed 
lines have the same slope. By the concavity of the piecewise linear curve 
through the points {(j,y;) : 0 < 7 < n}, this is possible if and only if the 
three points (k — 1, yx-1), (k, yx), and (k +1, yx41) all lie on a straight 
line. This is equivalent to the assertion yz = (ye—1 + Ye+1)/2, 80, by 
geometry, we find that equality holds in the kth Maclaurin bound if and 
only if it holds in the kth Newton bound. 

It takes only a moment to check that equality holds in each of Newton’s 
bounds when x2; = x2 = --: = Zp, and there are several ways to prove 
that this is the only circumstance where equality is possible. For us, 
perhaps the easiest way to prove this assertion is by making some small 
changes to our induction argument. In fact, the diligent reader will 
surely want to confirm that our induction argument can be repeated 
almost word for word while including induction hypothesis (12.7) the 
condition for strict inequality. 


PASSAGE TO MUIRHEAD 


David Hilbert once said, “The art of doing mathematics consists in 
finding that special case which contains all the germs of generality.” The 
next challenge problem is surely more modest than the examples that 
Hilbert had in mind, but in this chapter and the next we will see that 
it amply illustrates Hilbert’s point. 


Problem 12.2 (A Symmetric Appetizer) 
Show that for nonnegative x, y, and z one has the bound 


gy? + 0223 + y2a? + y223 + 2703 + z7y3 


<ay* + az + yo* + yz* + za*F zy', (12.16) 


and take inspiration from your discoveries to generalize this result as 
widely as you can. 


MAKING CONNECTIONS 


We have already met several problems where the AM-GM inequal- 
ity helped us to understand the relationship between two homogeneous 
polynomials, and if we hope to use a similar idea here we need to show 
that each summand on the left can be written as a weighted geometric 
mean of the summands on the right. After some experimentation, one 
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is sure to observe that for any nonnegative a and b we have the product 
representation a2b3 = (ab*)3(a*b)?. The weighted AM-GM inequality 
(2.9) then gives us the bound 


a2b3 = (ab*)3(a4b)3 < = ab! fs a, (12.17) 
and now we just need to see how this may be applied. 

If we replace (a, b) in turn by the ordered pairs (x, y) and (y, x), then 
the sum of the resulting bounds gives us 2?y?+y?2? < vy*+a+y and, in 
exactly the same way, we can get two analogous inequalities by summing 
the bound (12.17) for the two pairs (x, z) and (z,), and the two pairs 
(y,z) and (z,y). Finally, the sum of the resulting three bounds then 
gives us our target inequality (12.16). 


PASSAGE TO AN APPROPRIATE GENERALIZATION 


This argument can be applied almost without modification to any 
symmetric sum of two-term products «%y’, but one may feel some un- 
certainty about sums that contain triple products such as x%y?z°. Such 
sums may have many terms, and complexity can get the best of us unless 
we develop a systematic approach. 

Fortunately, geometry points the way. From Figure 12.2 one sees at a 
glance that (2,3) = 3(1,4)+4(4, 1), and, by exponentiation, we see that 
this recaptures us our decomposition a2b? = (ab*)3(a*b)3. Geometry 
makes quick work of such two-term decompositions, but the real benefit 
of the geometric point of view is that it suggests useful representation 
for products of three or more variables. The key is to find the right 
analog of Figure 12.2. 

In abstract terms, the solution of the first challenge problem piv- 
oted on the observation that (2,3) is in the convex hull of (1,4) and 
its permutation (4,1). Now, more generally, given any pair of n-vectors 
a = (a1, Q2,...,An) and 3 = (81, B2,...,8n), we can consider an anal- 
ogous situation where q@ is contained in the convex hull H(@) of the set 
of points (3,1), 87(2);-+-,8r(n)) Which are determined by letting 7 run 
over the set S, of all n! permutations of {1,2,...,n}. 

This suggestion points us to a far reaching generalization of our first 
challenge problem. The result is due to another Scot, Robert Franklin 
Muirhead (1860-1941). It has been known since 1903, and, at first, it 
may look complicated. Nevertheless, with experience one finds that it 
has both simplicity and a timeless grace. 
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(1,4)=(G1, G2) 


The point (2,3) is 

in the convex hull 

of the permuted 
points (1, 4) and (4,1): 
(2,3)= 2(1, 4)+3 (4,1) 


a = (a1, a2) 


(4,1) =(62, 51) 


Fig. 12.2. Ifthe point (a1, a2) is in the convex hull of (31, G2) and ((2, 31) then 
x°1y°2 is bounded by a linear combination of 7°! y°2 and x7°2y°!. This leads 
to some engaging inequalities when applied to symmetric sums of products, 
and there are exceptionally revealing generalizations of these bounds. 


Problem 12.3 (Muirhead’s inequality) 
Given that a € H(B) where a = (a1, Q2,...,Qn) and B = (31, B2,.-., Bn), 


show that for all positive x1, %2,...,%n one has the bound 
De Pela Pelay  88ly S DY Pola Pota Polm (12.18) 
oESy oESy, 


A QUICK ORIENTATION 


To familiarize this notation, one might first check that Muirhead’s 
inequality does indeed contain the bound given by our second challenge 
problem (page 184). In that case, S3 is the set of six permutations of 
the set {1,2,3}, and we have (21,272,273) = (a, y, z). We also have 


(2,3, 0) and (81, G2, Bs) = (1,4, 0), 


)= 
and since (2,3,0) = $(1,4,0)+ 5(4, 1,0) we find that a € H(@). Finally, 
one has the a-sum 


(a1, 2,3 


S- xe ree xt, = a%y3 ba228 4 y2ad 4 y2e3 4 22y3 4 72y3, 
o€ES3 
while the G-sum is given by 
bes x Cae ae 3) = = cyt + x2* + yo* + yet + zat $+ zy’, 
oES3 
so Muirhead’s inequality (12.18) does indeed give us a generalization of 
our first challenge bound (12.16). 
Finally, before we address the proof, we should note that there is 
no constraint on the sign of the coordinates of a and (@ in Muirhead’s 
inequality. Thus, for example, if we take a = (1/2,1/2,0) and take 
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The set H(3) in R® isa 
subset of the 2-dimensional 
hyperplane in R? spanned by 
the six points obtained by 
permuting the coordinates 


of B = (81, G2, G3). 


Fig. 12.3. The geometry of the condition a € H() is trivial in dimension 
2, and this figure shows how it may be visualized in dimension 3. In higher 
dimensions, geometric intuition is still suggestive, but algebra serves as our 
unfailing guide. 


3 = (—1,2,0), then Muirhead’s inequality tells us that for positive 2, y, 
and z one has 


2 2 y? y? 2 2 
2 (Jay + vee + Jy2) <7 + Dee gr ae tae (12.19) 


2 x z x 

This instructive bound can be proved in many ways; for example, both 
Cauchy’s inequality and the AM-GM bound provide easy derivations. 
Nevertheless, it is Muirhead’s inequality which makes the bound most 
immediate and which embeds the bound in the richest context. 


PROOF OF MUIRHEAD’S INEQUALITY 

We were led to conjecture Muirhead’s inequality by the solution of 
our first challenge problem, so we naturally hope to prove it by leaning 
on our earlier argument. First, just to make the hypothesis a € H(() 
concrete, we note that it is equivalent to the assertion that 


(a1, Q2,. ee) = Se, Pe (Bria), Briads ee: Ontra) 


TESy 


where p,>0O and S- pr =. 
TESn 


Now, if we use the jth coordinate of this identity to express x55) as a 


product, then we can take the product over all 7 to obtain the identity 


a1 702 |. On Bray Br(2) _ Br(n) Pr 
o(1)Y¥o(2) °** %o(n) ~ II (ae Vo(2) “" Yo(n) ) 
TESn 


From this point the AM-GM inequality and arithmetic do the rest of 


x 
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the work. In particular, we have 


i i aac Br(n) 
De Poltetay Palm) S DY Dy PrBett) Poa) Poin) 
FEE, oESn TES 
= Br (1) Br (2) Br (n) 
a SS Pr Vea) Ya(2) °° Vo(n) 
TESn, cESn 
= Bn 
7" oe Dr > ant ye ae "2 om) 
TESy, cESy, 
> : 
a we(1)Poi2) * Poln)? 
oS, 
and, as one surely hoped, the two ends of this chain give us Muirhead’s 
inequality (12.18). 


LOOKING BACK: BENEFITS OF SYMMETRY 


There is nothing difficult in the individual steps of the calculations 
that give us Muirhead’s inequality (12.18), but the sudden disappearance 
of the p, may seem like exceptionally good luck. To be sure, we are not 
strangers to the benefits that sometimes flow from changing the order 
of summation, but, as this example points out, those benefits can be 
particularly striking when symmetric sums are involved. 

In many cases, dramatic simplifications arise simply from the observa- 
tion that “the permutation of a permutation is a permutation.” Some- 
times we need to check that a one-to-one correspondence works as we 
hope it should, but even this step just takes patience. The miracle is 
already in the mix. 

With experience, one finds that Muirhead’s inequality (12.18) is a 
remarkably effective tool for understanding the relations between sym- 
metric sums. Nevertheless, applications of Muirhead’s inequality come 
at a price: somehow one must check the hypothesis a € H(3). In many 
useful cases this can be done by inspection, but before Muirhead’s in- 
equality can come into its own, one needs a systematic way to test Muir- 
head’s condition a € H(G). Remarkably enough, there is an equivalent 
condition that lends itself to almost automatic checking. It is known as 
majorization, and it provides the central theme of our next chapter. 


EXERCISES 


Exercise 12.1 (On Polynomials with Positive Roots) 
Show that if the real polynomial P(x) = 2" +a ,x2"~!4+--:+an_12+4n 
has only positive roots, then one has the bound nay, < ajan_1. 
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Exercise 12.2 (Three Muirhead Short Stories) 
(a) Show that for nonnegative a,b, and c one has 


8abe < (a+ b)(b+ c)(c+a). (12.20) 


(b) Show that for real a;, 1 < 7 <n, one has 


2 S > ajax < (n-1) a Qa (12.21) 


1<j<k<n j=l 
(c) Show that for nonnegative aj, 1 <j <n, one has 


(aa3-+- ay)!" < S- Jfajag. (12.22) 


1<j<k<n 


Exercise 12.3 (The Homogenization Trick) 
Show that if the positive quantities x, y, and z satisfy the relation 
xyz = 1 then one has the inequality 


ety cai ty +z. (12.23) 


The salient feature of this bound is that the left side is homogeneous 
of order 2 but the right side is homogeneous of order 3. Somehow the 
constraint xyz = 1 must make up for this incompatibility. 

It may be unclear how to exploit the constraint xyz = 1, but one trick 
which works remarkably often is to use the side condition to construct a 
homogeneous problem which generalizes the problem at hand. One then 
solves the homogeneous problem with the help of Muirhead’s inequality 
or related tools. 


Exercise 12.4 (Power Sum Inequalities) 
Show that for positive numbers x,, 1 < k < n, the power sums defined 
by Si,(x) = a7 +a’ +---+ 2%" satisfy the bounds 


S?\(x) < Sm—1(*)Sm4i(x) for all m =1,2,.... (12.24) 


These may remind us of Newton’s inequalities, but they are more el- 
ementary. They also tell us that the sequence {log S,,(x)} is convex, 
while Newton’s inequalities tell us that {log E,,(x)} is concave. 


Exercise 12.5 (Symmetric Problems & Symmetric Solutions) 
Consider a real symmetric polynomial p(x, y) such that p(a,y) — oo 
as |a| — oo and |y| — oo. It is reasonable to suspect that p attains its 
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minimum at a “symmetric point.” That is, one might conjecture that 
there is a t € R such that 


p(t, t) = min p(x, y). 
x,y 


This conjecture was proved for polynomials of degree three or less 
by Victor Yacovlevich Bunyakovsky in 1854, some five years before the 
publication of his famous Mémoire on integral inequalities. Bunyakovsky 
also provided a counterexample which shows that the conjecture is false 
for a polynomial with degree four. Can you find such an example? 


Exercise 12.6 (Symmetry — Destroyed by Design) 

Participants in the 1999 Canadian Olympiad were asked to show that 
if x, y, and z are nonnegative real numbers for which «+ y+z = 1, then 
one has the bound 

4 
Me.y.2) =e y+ Pat Pes 5. 
As a hint, first check by calculus that f(x,y, z) is maximized on the set 
x+y =1 by taking x = 2/3 and y = 1/3, so the crucial step is to show 
that without loss of generality one can assume that z = 0. 


Exercise 12.7 (Creative Bunching) 

A problem in the popular text Probability by Jim Pitman requires one 
to show in essence that if 7, y, and z are nonnegative real numbers for 
which «+ y+z=1, then 

1 
Z <a tyi +2 + 6ryz. 


Can you check this bound? Can you check it in more than one way? 


Exercise 12.8 (Weierstrass’s Polynomial Product Inequality) 
Show that if the complex numbers aj, a2,...,@n, and bj, b2,..., Dn, sat- 
isfy jaj;| < 1 and |b,| < 1 for all 1 <j <n then 


|a1a2°++ Gyn — b1b2--+ by| < S|a; — bj). (12.25) 
j=l 


13 


Majorization and Schur Convexity 


Majorization and Schur convexity are two of the most productive con- 
cepts in the theory of inequalities. They unify our understanding of 
many familiar bounds, and they point us to great collections of results 
which are only dimly sensed without their help. Although majorization 
and Schur convexity take a few paragraphs to explain, one finds with 
experience that both notions are stunningly simple. Still, they are not as 
well known as they should be, and they can become one’s secret weapon. 


Two BARE-BONES DEFINITIONS 


Given an n-tuple y = (71,72,---, Yn), we let yj), 1 < j <n, denote 
the jth largest of the n coordinates, so yj) = max{yj : 1 < j < n}, 
and in general one has ¥1) > Yq] 2 ++: = Ynj- Now, for any pair of real 
n-tuples a = (a1, @2,...,Q@,) and B = (81, B2,...,8n), we say that a is 
majorized by @ and we write a < ( provided that a and @ satisfy the 
following system of n — 1 inequalities: 


any) < Bay 
any + a2] < Fay + Fe), 
oe 
any) + Og) +--+ + Om—1 S Fay + Fe Bin—15 
together with one final equality: 
Oy + ey t + Oay = py Fy + + Bin): 


Thus, for example, we have the majorizations 
(1,1,1,1) ~ (2,1,1,0) ~ (3,1,0,0) ~ (4,0, 0,0) (13.1) 


and, since the definition of the relation a < ( depends only on the 
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corresponding ordered values, {aj}} and {jj}, we could just as well 
write the chain (13.1) as 


(Ly 1) = (0,1, 1,2) ~< (1, 3,0, 0) = (0,0, 4, 0). 


To give a more generic example, one should also note that for any 
(Q1,Q2,...,Q@n) we have the two relations 


(@, @,...,@) ~ (a1, Q@2,...,Qn) ~ (a1 Fag +--+ +n,0,...,0) 


where, as usual, we have set @ = (aj + a2 +...+ Q,)/n. Moreover, 
it is immediate from the definition of majorization that relation ~< is 
transitive: a < @ and @ ~< y imply that a < 7. Consequently, the 
4-chain (13.1) actually entails six valid relations. 

Now, if A Cc R¢ and f : A— R, we say that f is Schur conver on A 
provided that we have 


f(a) < f(B) for all a, @ € A for which a < (. (13.2) 


Such a function might more aptly be called Schur monotone rather than 
Schur convex, but the term Schur convex is now firmly rooted in tradi- 
tion. By the same custom, if the first inequality of the relation (13.2) is 
reversed, we say that f is Schur concave on A. 


THE TYPICAL PATTERN AND A PRACTICAL CHALLENGE 


If we were to follow our usual pattern, we would now call on some 
concrete problem to illustrate how majorization and Schur convexity 
are used in practice. For example, we might consider the assertion that 
for positive a, b, and c, one has the reciprocal bound 


5 re Lape are es ( E 
<-4 13.3 
bo Cc” & nga: uy) 


where & = b+c—a,y=a+c—b, z=a+b-—c, and where we assume 
that x, y, and z are strictly positive. 

This slightly modified version of the American Mathematical Monthly 
problem E2284 of Walker (1971) is a little tricky if approached from first 
principles, yet we will find shortly that it is an immediate consequence 
of the Schur convexity of the map (t1, ta,t3) > 1/t; + 1/t2 + 1/t3 and 
the majorization (a, b,c) ~ (x,y, z). 

Nevertheless, before we can apply majorization and Schur convexity to 
problems like E2284, we need to develop some machinery. In particular, 
we need a practical way to check that a function is Schur convex. The 
method we consider was introduced by Issai Schur in 1923, but even now 
it accounts for a hefty majority of all such verifications. 
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Problem 13.1 (Schur’s Criterion) 

Given that the function f : (a,b)” — R is continuously differentiable 
and symmetric, show that it is Schur convex on (a, b)" if and only if for 
alll <j<k<n and allx € (a,b)” one has 


0 < (2; v4) (9 of) (13.4) 


AN ORIENTING EXAMPLE 
Schur’s condition may be unfamiliar, but there is no mystery to its 
application. For example, if we consider the function 


f(t, to, t3) = 1/t1 + 1/te + 1/ts 


which featured in our discussion of Walker’s inequality (13.3), then one 
easily computes 


(ty — ty) (PHO — EO) — ce, - yard - 178). 


This quantity is nonnegative since (t;,t,) and (1/t7,1/t;) are oppositely 
ordered, and, accordingly, the function f is Schur convex. 


INTERPRETATION OF A DERIVATIVE CONDITION 


Since the condition (13.4) contains only first order derivatives, it may 
refer to the monotonicity of something, the question is what? The answer 
may not be immediate, but the partial sums in the defining conditions 
of majorization do provide a hint. 

Given an n-tuple w = (wi, W2,...,Wn), it will be convenient to write 
Wj = wi +wet---+w, and to set w = (W1, W2,..., Wn). In this notation 
we see that the majorization x < y holds if and only if we have x; < y; 
for all 1 < 7 < n. One benefit of this “tilde transformation” is that 
is makes majorization look more like ordinary coordinate-by-coordinate 
comparison. 

Now, since we have assumed that f is symmetric, we know that f 
is Schur convex on (a, 6)” if and only if it is Schur convex on the set 
B = (a,b)" ND where D = {(21,22,...,@n) 2 1 > 2 > ++: > Uy}. 
Also, if we introduce the set B = {X : x € B}, then we can define a new 
function f : B > R by setting f(X) = f(x) for all X € B. The point of 
the new function f is that it should translate the behavior of f into the 
simpler language of the “tilde coordinates.” 

The key observation is that f(x) < f(y) for all x,y € B withx <y 
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if and only if we have f(X) < f(¥) for all X,¥ € B such that 
In=Yn and 2; < yj for alll <j<n. 


That is, f is Schur convex on 6 if and only if the function fon Bisa 
nondecreasing function of its first n — 1 coordinates. 

Since we assume that f is continuously differentiable, we therefore 
find that f is Schur convex if and only if for each x in the interior of B 
we have 


6222) eral een 
OX 
Further, because f(X) = f(€1,%2 — £1,...,%n — Fn—1), the chain rule 
gives us 
pe eT I) ei ey, (13.5) 
Ox; Ox; OX 541 


so, if we take 1 < 7 < k <n and sum the bound (13.5) over the indices 
jj+1,...,k—1, then we find 


gz OF) _ OFC) for all x € B. 
Ox; Ox, 


By the symmetry of f on (a,b)”", this condition is equivalent to 


0 < (a; — re) (“fe fe) for all x € (a,b)”, 


and the solution of the first challenge problem is complete. 


A LEADING CAsE: AM-GM via SCHUR CONCAVITY 


To see how Schur’s criterion works in a simple example, consider the 
function f(x1,22,...,%,) = %12_-++%_, where0 <2; < oo forl<j< 
n. Here we see that Schur’s differential (13.4) is just 


(x; a CK) (fo; — fey) = —(xj ~ te)? (a4 TUG U jp. LR-1L R41 °°" Ln), 


and this is always nonpositive. Therefore, f is Schur concave. 

We noted earlier that X < x where X is the vector (%,Z%,...,%) and 
where Z is the simple average (x, +22+:-:-+a,,)/n, so the Schur concavity 
of f then gives us f(x) < f(x). In longhand, this says 7122-+- a, < Z", 
and this is the AM-GM inequality in its most classic form. 

In this example, one does not use the full force of Schur convexity. In 
essence, we have used Jensen’s inequality in disguise, but there is still 
a message here: almost every invocation of Jensen’s inequality can be 
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replaced by a call to Schur convexity. Surprisingly often, this simple 
translation brings useful dividends. 


A SECOND TOOL: VECTORS AND THEIR AVERAGES 


This proof of the AM-GM inequality could hardly have been more 
automatic, but we were perhaps a bit lucky to have known in advance 
that X < x. Any application of Schur convexity (or Schur concavity) 
must begin with a majorization relation, but we cannot always count on 
having the required relation in our inventory. Moreover, there are times 
when the definition of majorization is not so easy to check. 

For example, to complete our proof of Walker’s inequality (13.3), we 
need to show that (a,b,c) ~ (x,y, z), but since we do not have any infor- 
mation on the relative sizes of these coordinates, the direct verification 
of the definition is awkward. The next challenge problem provides a 
useful tool for dealing with this common situation. 


Problem 13.2 (Muirhead Implies Majorization) 
Show that Muirhead’s condition implies that a is majorized by 3; that 
is, show that one has the implication 


aéAH(B) = ax. (13.6) 


FROM MUIRHEAD’S CONDITION TO A SPECIAL REPRESENTATION 


Here we should first recall that the notation a € H() simply means 
that there are nonnegative weights p, which sum to 1 for which we have 


(a1, O2,--- , Qn) = S- Pr(Br(1)s Br(2)s oN BiG) 


TESy 


or, in other words, a is a weighted average of (G,(1), 37(2),°** Br(m)) as 
7 runs over the set S, of permutations of {1,2,...,n}. If we take just 
the jth component of this sum, then we find the identity 


ie > PrOr(j) = | S- ph 6 = SS dyn Bes (13.7) 
TESn k=1 ‘ t:7(j)=k k=1 


where for brevity we have set 


Le SS we (13.8) 


T:7(j)=k 


and where the sum (13.8) runs over all permutations 7 € S, for which 
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T(j) =k. We obviously have d;, > 0, and we also have the identities 


So djz=1 and Sod, =1 (13.9) 
j=l k=1 


since each of these sums equals the sum of p, over all Sy. 

A matrix D = {dj,} of nonnegative real numbers which satisfies the 
conditions (13.9) is said to be doubly stochastic because each of its rows 
and each of its columns can be viewed as a probability distribution on 
the set {1,2,...,n}. Doubly stochastic matrices will be found to provide 
a fundamental link between majorization and Muirhead’s condition. 

If we regard a and ( as column vectors, then in matrix notation the 
relation (13.7) says that 


aéH(s8) = a=Dp (13.10) 


where D is the doubly stochastic matrix defined by the sums (13.8). 
Now, to complete the solution of the first challenge problem we just 
need to show that the representation a = DG implies a ~ (@. 

FROM THE REPRESENTATION @ = DG TO THE MAJORIZATION a ~ 


Since the relations a € H() and a ~ £ are unaffected by permuta- 
tions of the coordinates of a and (3, there is no loss of generality if we 
assume that a; > ag >--- > ay, and 6, > G2 >--- > Bn. If we then 
sum the representation (13.7) over the initial segment 1 < 7 < k, then 
we find the identity 


n n 


k k 
Si aj = So dith = Soc = where, SS dy. (13.11) 
j=l 


j=l t=1 t=1 j=l 


Since c; is the sum of the first k elements of the tth column of D, the 
fact that D is doubly stochastic then gives us 


0<q <1 forall <t<n andcy+eg+---+ce, =k. (13.12) 


These constraints strongly suggest that the differences 


k k n k 
def ~ ~ ~ 
An = 95 - )_ 8 =>) ch: — > 2; 
j=l j=l t=1 j=l 


are nonpositive for each 1 < k <n, but an honest proof can be elusive. 
One must somehow exploit the identity (13.12), and a simple (yet clever) 
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way is to write 


aa 38; + Ae(k- 3-4) 


j=l j=l j=l 
(Bx — Bi)\(L+e;) + S> cj (8; - Br): 

j=1 jHk+1 
It is now evident that A; <0 since for all 1 < j < k we have (3; > (x 
while for all k < j <n we have 8; < Gx. It is trivial that A, = 0, so 
the relations A, < 0 for 1 < k <n complete our check of the definition. 
We therefore find that a < (, and the solution of the second challenge 
problem is complete. 


iM= 


FINAL CONSIDERATION OF THE WALKER EXAMPLE 

In Walker’s Monthly problem (page 192) we have the three identities 
z=b+c-a,y=a+c—b,z=a+b-—c, so to confirm the relation 
(a,b,c) € H[(x,y, z)], one only needs to notice that 


z 
x}. (13.13) 
y 


This tells us that a < (3, so the proof of Walker’s inequality (13.3) is 
finally complete. 

Our solution of the second challenge problem also tells us that the 
relation (13.13) implies that (a, b,c) is the image of (x,y, z) under some 
doubly stochastic transformation D, and it is sometimes useful to make 
such a representation explicit. Here, for example, we only need to express 


the identity (13.13) with permutation matrices and then collect terms: 
1 ol 
a l 0 1 0 x l 0 0 1 x : 2 3 
b} == 10 0 1] Jy} + 5 TiO S00 sigs Oe ue ey 
C L 0-07 [2 0 1 O/ Lz 5 3 O/ lz 


A CONVERSE AND AN INTERMEDIATE CHALLENGE 


We now face an obvious question: Is is also true that a < (@ implies 
that a € H(3)? In due course, we will find that the answer is affirma- 
tive, but full justification of this fact will take several steps. Our next 
challenge problem addresses the most subtle of these. The result is due 
to the joint efforts of Hardy, Littlewood, and Pélya, and its solution 
requires a sustained effort. While working through it, one finds that 
majorization acquires new layers of meaning. 
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Problem 13.3 (The HLP Representation: a < 3 => a= D£) 


Show that a < 2 implies that there exists a doubly stochastic matrix 
D such that a= DB. 


Hardy, Littlewood, and Pélya came to this result because of their in- 
terests in mathematical inequalities, but, ironically, the concept of ma- 
jorization was originally introduced by economists who were interested 
in inequalities of a different sort — the inequalities of income which one 
finds in our society. Today, the role of majorization in mathematics far 
outstrips its role in economics, but consideration of income distribution 
can still add to our intuition. 


INCOME INEQUALITY AND ROBIN HOOD TRANSFORMATIONS 


Given a nation A we can gain some understanding of the distribution 
of income in that nation by setting a; equal to the percentage of total 
income which is received by the top 10% of income earners, setting a2 
equal to the percentage earned by the next 10%, and so on down to aj 
which we set equal to the percentage of national income which is earned 
by the bottom 10% of earners. If @ is defined similarly for nation B, 
then the relation a < 7 has an economic interpretation; it asserts that 
income is more unevenly distributed in nation B than in nation A. In 
other words, the relation < provides a measure of income inequality. 

One benefit of this interpretation is that it suggests how one might 
try to prove that a < @ implies that a = Df for some doubly stochastic 
transformation D. To make the income distribution of nation B more 
like the income of nation A, one can simply draw on the philosophy 
of Robin Hood: one steals from the rich and gives to the poor. The 
technical task is to prove that this thievery can be done in scientifically 
correct proportions. 


THE SIMPLEST CASE: n = 2 


To see how such a Robin Hood transformation would work in the 
simplest case, we just take a = (a1,a2) = (p+ 0,p —@) and take 
B = (61, 82) =(p+7,p—7). There is no loss of generality in assuming 
Qy > ae, 2; > Bo, and a, + a2 = 9, + G2; moreover, no loss in assuming 
that a and (@ have the indicated forms. The immediate benefit of this 
choice is that we have a < ( if and only if 0 <r. 

To find a doubly stochastic matrix D that takes 3 to a is now just 
a question of solving a linear system for the components of D. The 
system is overdetermined, but it does have a solution which one can 
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confirm simply by checking the identity 


tTto TOO 
DG = ee ce ee = CS =a. (13.14) 
27 27 p He Prag, 


Thus, the case n = 2 is almost trivial. Nevertheless, it is rich enough 
to suggest an interesting approach to the general case. Perhaps one can 
show that an n x n doubly stochastic matrix D is the product of a finite 
number transformations each one of which changes only two coordinates. 


AN INDUCTIVE CONSTRUCTION 


If we take ay > ag > +: > Gy and By > Bo > --- > Bn where 
a ~< (3, then we can consider a proof by induction on the number N of 
coordinates j such that a; 4 3;. Naturally we can assume that N > 1, 
or else we can simply take D to be the identity matrix. 

Now, given N > 1, the definition of majorization implies that there 
must exist a pair of integers 1 < 7 < k < n for which we have the bounds 


By > aj, Be<ar, and 6,=as5 forallj<<s<k. (18.15) 


Figure 13.1 gives a useful representation of this situation; the essence of 
which is that the interval [a,,a,;] is properly contained in the interval 
(Gx, 33]. The intervening values a, = 3; for 7 < s < k are omitted from 
the figure to minimize clutter, but the figure records several further 
values that are important in our construction. In particular, it marks 
out p = (8; + Gx)/2 and t > 0 which we choose so that 8; = p+7T 
and 6, = p—T, and it indicates the value o which is defined to be the 
maximum of a, — p| and ja; — p|. 

We now take T to be the n x n doubly stochastic transformation which 


takes 8 = (61, G2,..., Bn) to 6’ = (8), 84,..., 81.) where 
B= Beto, Bi=B;-o, and P=  forallt #j,t#k. 


The matrix representation for T is easily obtained from the matrix given 
by our 2 x 2 example. One just places the coefficients of the 2 x 2 matrix 
at the four coordinates of T’ which are determined by the 7, & rows and 
the 7, k columns. The rest of the diagonal is then filled with n — 2 ones 
and then the remaining places are filled with n? — n — 2 zeros, so one 
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p 
Bn Be=p-T : By =p+T Be Br 


ee 


OO ——— 3’ Values 


GB Values 


p-—o. : Lpto 
: ae : 
: ; a Values 
An Ak : 1 Qj: a2 a1 
*, 7 . 
p—-O + pro 
p 


Fig. 13.1. The value p is the midpoint of 6, = p—7 and 8; = p+7 as well 
as the midpoint of a, = p—o and aj = p+a. We have 0 < o <7, and the 
figure shows the case when |ax — p| is larger than |a; — p]. 


comes at last to a matrix with the shape 


1 
1 
tto TO 
27 27 
: ; (13.16) 
T-O To 
27 27 


THE INDUCTION STEP 


We are almost ready to appeal to the induction step, but we still need 


to check that a ~ 6’ = T. If we use s:(y) = yi +72+:---+7% to simplify 
the writing of partial sums, then we have three basic observations: 
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Observations (a) and (c) are immediate, and to justify (b) we only need 
to note that a; < 3; and to recall that a; = 6; = 8 for 7 <t <k. 

These bounds confirm that a@ < @’ and, by the design of T, we know 
that the n-tuples a and (’ agree in all but at most N — 1 coordinates. 
Hence, by induction, there is a doubly stochastic matrix D’ such that 
a = D’B'. Since @’ = TG, we therefore have a = D’(TB) = (D’'T){, and, 
since the product of two doubly stochastic matrices is doubly stochastic, 
we see that the matrix D = D’T provides us with the solution to our 
challenge problem. 


JENSEN’S INEQUALITY: REVISITED AND REFINED 


The Hardy, Littlewood, Pélya representation a = D@ is a statement 
about averages. Part of its message is that for each j the value a; is an 
average of (1, 9o,..., Gn, but the identity a = DP actually tells us a bit 
more. Specifically, we also know that each column of D must sum to 
one, though for the moment it may not be clear how one might use this 
additional information. 

We do know from our experience with Jensen’s inequality that aver- 
ages and convex functions can be combined to provide a large number of 
useful inequalities, and it is natural to ask if the representation a = DG 
might provide something even grander. Issai Schur confirmed this sug- 
gestion with a simple calculation which has become a classic part of the 
lore of majorization and which provides the final challenge problem of 
the chapter. 


Problem 13.4 (Schur’s Majorization Inequality) 
Show that if @ : (a,b) > R is a convex function, then the function 
f :(a,b)" —R defined by the sum 


f(Pitayes 85) = S> o(xe) (13.17) 
k=1 


is Schur convex. Thus, for a, 3 € (a,b)” with a < GB one has 


S/ oan) < $5 o(Gx)- (13.18) 


k=1 k=1 

ORIENTATION 
To familiarize the bound (13.18), one should first note that if we take 
a = (Z,%,...,%) and 8 = (#1,%2,...,%n), then it reduces to Jensen’s 


inequality. Also, since the function t +> 1/t is convex on the set (0,00), 


202 Majorization and Schur Convexity 


we see that Schur’s majorization bound (13.18) also implies Walker’s 
inequality (13.3), since we know now that the representation (13.13) 
implies (a,b,c) ~ (a, y, z). 

One should further note that if we assume that ¢ is differentiable, then 
the Schur convexity of f follows almost immediately from the differential 
criterion (13.4). In particular, by the convexity of ¢ the derivative ¢’ 
is nondecreasing, so the pairs (x;,xv,) and (¢’(x;), ¢/(xx)) are similarly 
ordered. Consequently, Schur’s differential 


(aj — te) (fo; (%) — fo; (*)) = (aj — te) ($'(@j) — $'(#x)) 


is nonnegative, and f is Schur convex. Part of our challenge problem is 
thus to prove the Schur convexity of f without recourse to differentia- 
bility. 


A Direct APPROACH 


To prove the bound (13.18) by a direct appeal to convexity of ¢, 
one needs to find an appropriate average, and the HLP representation 
(page 198) is a natural place to look. We are given a ~< 3, so the HLP 
representation tells us that there is a doubly stochastic matrix D = {d;x} 
such that a = DZ, or, in longhand, for each 7 = 1,2...,n we have the 
representation 


n 
aj = So din Be where dj, + dj2-+-+djn =1. 
k=1 
Now, if we apply Jensen’s inequality to these averages, we have 


n 
(a5) < S > djxb(br), 
k=1 
and, except for the abstract quality of the dj, factors, this bound is 
better than the one we seek. In particular, if we sum over 7 and change 
the order of summation, we find 


just as we hoped to show. 

No one would deny that Schur’s majorization inequality (13.18) is a 
very easy result, but one should not be deceived by its simplicity. It 
strips away the secret of many otherwise mysterious bounds. 
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A Day-To-DAay EXAMPLE 


The final challenge addresses a typical example of the flood of prob- 
lems that one can solve — or invent — with help from the tools devel- 
oped in this chapter. 


Problem 13.5 Given x,y,z € (0,1) such that 
max(z,y,z) < («@+y+z)/2 <1, (13.19) 


show that one has the bound 
2 


(22) EY) (4) < {EEE san 


If this problem were met in another context, it might be quite puzzling. 
It is not obvious that the two sides are comparable, and the hypothesis 
(13.19) is unlike anything we have seen before. Still, with majorization 
in mind, one may not need long to hit on a fruitful plan. 

In particular, one might think of exploiting the hypothesis (13.19) by 
noting that it gives us (x,y, z) < (s,s,0) where s = (x+-y+z)/2. After 
this observation, it becomes clear that the bound (13.20) would follow 
from Schur’s majorization inequality (13.18) if we could show that 


o(t) = log (5) 


is a convex function on (0,1). This is easily confirmed by direct calcu- 
lation of the second derivative, 


e(t) = 


Ry] bl 
— 


At 
@-1P 


but it is also obvious from the Taylor expansion 


> 0, 


3 5 
si) a2(t4 +54). 


ILLUSTRATIVE EXERCISES AND A VESTIGE OF THEORY 


Most of the chapter’s exercises are designed to illustrate the appli- 
cations of majorization and Schur convexity, but last the two exercises 
serve a different purpose. They are given to complete the picture of 
majorization theory that is illustrated by Figure 13.2. We have proved 
all of the implications pictured there except for the one which we have 
labelled as Birkhoff’s Theorem. 

This famous theorem asserts that every doubly stochastic matrix is a 
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a € H(3) ———> a= D6 — og 


(5) 


Birkhoff’s 
Theorem 


(3) 


Hardy, 
Littlewood, 
Polya 


(4) || trivial 
a=TT,---T,6 


Fig. 13.2. Sometimes the definition of @ < @ is easy to check, but perhaps more 
often one relies on either the condition a = Df or the condition a € H() to 
prove majorization. 


convex combination of permutation matrices, and it closes the loop on 
the double implication a <~ 6 = a € H({) asserting the equivalence 
of majorization and Muirhead’s condition. Most day-to-day applica- 
tions of majorization do not require Birkhoff’s half of this equivalence, 
but Birkhoff’s theorem has applications throughout pure and applied 
mathematics. It is sometimes called the fundamental theorem of doubly 
stochastic matrices. 


EXERCISES 


Exercise 13.1 (Two Doubly Stochastic Giveaways) 
Show that for positive x,y,z one has the product bound 


ryz < («/2 +y/34+ :/6) («/3 + 24/3) («/6 + sy), 


and the awe inspiring reciprocal bound 


ae 6 . 6 ome: ren 
g+y) © \8at+yt2z)/ ° \3r4+3y4z) ~ 28 ' yb ¢ od 


Exercise 13.2 (Finding the Majorization) 
Given 1 < k < n and real numbers xz; > 0, 1 < 7 < n, such that 
max(21,%2,...,2n) < (@1 + a2 +--+ +24,)/k, show that one has 


n 


S- d <(n—k)4 ee (13.21) 


1+; k+taytagt-::+2y, 


j=l 


Majorization and Schur Convexity 205 
Exercise 13.3 (A Refinement of the 1-Trick) 


Given integers 0 < m <n and real numbers 21, 2%2,...,£, such that 
m m n 
ye tet (13.22) 
k=1 k=1 


where 6 > 0, show that the sum of squares has the lower bound 


dt 2 (Som) +o. (13.23) 


This refinement of the familiar 1-trick lower bound was crucial to the 
discovery and proof of the Szeméredi’s Regularity Lemma, which is one 
of the cornerstones of modern combinatorial theory. 


Exercise 13.4 (Symmetric Polynomials and Schur Concavity) 


After observing that the kth elementary symmetric function 
€xn(X) = eg (21,22,---,2n) = S- Lj, Lig Li, 
1<i1 <tg<-+<ipcn 
satisfies the elegant “cancellation identity” 


Oex (x) 
Ox 


= €p_1(1, Lo,.--,;Us—1, Us41;---, Ln); (13.24) 
show that e;(x) is Schur concave for x € [0, 00)”. 


Exercise 13.5 (Schur Concavity and Measures of Dispersion) 
Many methods have been proposed to measure dispersion. Statisti- 
cians, for example, often use the sample variance 


s(x) = 


n 
S > (a; - 2) where % = (@1 + %2+-:-+2%p)/n 
j=l 


1 
n—-1 
for x € R”, n > 2, while information theorists rely on the entropy 


h(p) = — S) pe log px 
k=1 


to measure of dispersion of the probability distribution (p1,p2,...,Dn) 
where py, > 0 and p, + po +---+ pn = 1. Show that both the sample 
variance s(x) and the entropy h(p) are Schur convex. 
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Exercise 13.6 (Another Inversion Preserving Form) 


If pp > 0, pr + po +--+ + Pn = 1, and 0 < a show that 
(i? FAD)® oo 1\° 
ey ae < be Prt == (13.25) 


Incidentally, way back in Exercise 1.6 we used Cauchy’s inequality to 
deal with the case a = 1. Remarkably often majorization helps one to 
put a consequence of Cauchy’s inequality into a broader context. 


Exercise 13.7 (A Birthday Problem) 


Given n random people, what is the probability that two or more of 
them have the same birthday? Under the natural (but approximate!) 
model where the birthdays are viewed as an independent and uniformly 
distributed in the set {1,2,....,365}, show that this probability is at 
least 1/2 if n > 23. For the more novel bit, show this probability does 
not go down if one drops the assumption that the birthdays are uniformly 
distributed. 


Exercise 13.8 (SDRs and the Marriage Problem) 


If $1,S$9,...,S, is a collection of subsets of the set S, we say that the 
set R = {21,22,...,%n} C S is a system of distinct representatives (or 
an SDR) provided that the elements of R are all distinct and x, € Sp 
for each 1 < k <n. Prove that a necessary and sufficient condition for 
the existence of an SDR is that one has the inequality 


U 3; 


jeEA 


|A| < for all AC {1,2,...,n}, (13.26) 


where |C| is used as shorthand for the cardinality of a set C. 


The quaint term “marriage problem” comes from a 1949 article by 
Hermann Weyl who essentially put the issue as follows: given a set of 
girls and boys, it is possible for each girl to marry a boy she knows if 
and only if each subset of k girls knows at least k boys. 


The marriage lemma is one of the most widely applied results in all 
of combinatorial theory, and it has many applications to the theory of 
inequalities. In particular, it is of great help with the final exercise which 
develops Birkhoff’s Theorem. 
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Exercise 13.9 (Birkhoff’s Theorem) 
Given a permutation o € S,,, the permutation matrix associated with 
o is the n x n matrix P; = (P;(j,k):1< j,k <n) with entries 


Pi.) = f if o(j) =k 


0 otherwise. 


Show that if D is an n x n doubly stochastic matrix, then there exist 
nonnegative weights {w, :o € S,}, such that 


Ste =) aid So wR; =D (13.27) 
cESn aESn 


In other words, every doubly stochastic matrix is an average of permu- 
tation matrices. 


14 


Cancellation and Aggregation 


Cancellation is not often discussed as a self-standing topic, yet it is the 
source of some of the most important phenomena in mathematics. Given 
any sum of real or complex numbers, we can always obtain a bound by 
taking the absolute values of the summands, but such a step typically 
destroys the more refined elements of our problem. If we hope to take 
advantage of cancellation, we must consider summands in groups. 

We begin with a classical result of Niels Henrik Abel (1802-1829) who 
is equally famous for his proof of the impossibility of solving the general 
quintic equation by radicals and for his brief tragic life. Abel’s inequal- 
ity is simple and well known, but it is also tremendously productive. 
Many applications of cancellation call on its guidance, either directly or 
indirectly. 


Problem 14.1 (Abel’s Inequality) 

Let 21, 22,---,2n denote a sequence of complex numbers with partial 
sums Sp = 21 + za 4+-:-+ 2, 1 < k <n. For each sequence of real 
numbers such that ay > ag > ++: > ayn > 0 one has 


larz1 + agz2 +--+ + An2n| < a1 max |S;|. (14.1) 
1<k<n 


MAKING PARTIAL SUMS MORE VISIBLE 


Part of the wisdom of Abel’s inequality is that it shifts our focus onto 
the maximal sequence M,, = max)<r<n|Sp|, nr = 1,2,..., even when our 
primary concern might be for the sums a121 + a222+-::+@nZn. Shortly 
we will find that there are subtle techniques for dealing with maximal 
sequences, but first we should attend to Abel’s inequality and some of 
its consequences. 

The challenge is to bound the modulus of a1 2; +a222+:+:+@,2y with 
help from max) <k<n |Sx|, SO a natural first step is to use summation by 
parts to bring the partial sums S$, = z1 + 22 +---+ 2, into view. Thus, 
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we first note that 
121 + A222 + +++ + Gn%n = 4191 + a2(S2 — $1) +--+ + @n(Sn — Snr—1) 
= Si(ai —s az) + So(ag — az) Se Sn—1(An—1 = Gn) + SnQn- 
This identity (which is often called Abel’s formula) now leaves little left 
for us to do. It shows that |a121 + agz2 +--+: +@n2Z,| is bounded by 
|Si|(a1 — a2) + |S2|(a2 — a3) + +++ + |Sn—1|(@n—-1 — Gn) + |Snlan 


< max |Sx|{(a1 — a2) + (@2 — a3) +--+ + (G@n—1 — An) + Gn} 
1<k<n 


=a, max |Sxl, 
1<k<n 
and the (very easy!) proof of Abel’s inequality is complete. 


APPLICATIONS OF ABEL’S INEQUALITY 


Abel’s inequality may be close to trivial, but its consequences can be 
surprisingly elegant. Certainly it is the tool of choice when one asks 
about the convergence of sums such as 


wo (-1F cos(k 
Q= » Vk Dea + log ( fe : z 
For example, in the first case Abel’s inequality gives the succinct bound 


(=p" 
2 Vk 


k=M 


1 
< forall <M<N<o. (14.2) 


VM 
This is more than one needs to show that the partial sums of Q form a 
Cauchy sequence, so the sum Q does indeed converge. 

The second sum R may look harder, but it is almost as easy. Since 
the sequence {cos(k7/6) : k = 1,2,...,} is periodic with period 12, it is 
easy to check by brute force that 


N 
max 2 cos(kr/6)| = 2+ V3 = 3.732..., (14.3) 


so Abel’s inequality gives us another simple bound 


‘: cos(k7/6) E 2+V3 


for alll <M<N<oo. (144 
2a log(k+1)|~log(Mt+1 oP A) 


This bound suffices to show the convergence of R and, moreover, one can 
check by numerical calculation that it has very little slack. For example, 
the constant 2 + /3 cannot be replaced by a smaller one. Without 
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foreknowledge of Abel’s inequality, one probably would not guess that 
the partial sums of R would have such simple, sharp bounds. 


THE ORIGINS OF CANCELLATION 


Cancellation has widely diverse origins, but bounds for partial sums of 
complex exponentials may provide the single most common source. Such 
bounds lie behind the two introductory examples (14.2) and (14.3), and, 
although these are particularly easy, they still point toward an important 
theme. 

Linear sums are the simplest exponential sums. Nevertheless, they 
can lead to subtle inferences, such as the bound (14.7) for the quadratic 
exponential sum which forms the core of our second challenge problem. 
To express the linear bound most simply, we use the common shorthand 


e(t) as exp(27it) and ||¢|| = min{|t — k| : k € Z}; (14.5) 


so, here, ||¢|| denotes the distance from t € R to the nearest integer. 
This use of the “double bar” notation is traditional in this context, and 
it should not lead to any confusion with the notation for a vector norm. 


Problem 14.2 (Linear and Quadratic Exponential Sums) 

First, as a useful warm-up, show that for allt € R and all integers M 
and N one has the bounds 
M+N 


1 1 
kt)} < min, N, ——— > < mi —— : 
oe e( 0] < min ; ss} < ming N a \ (14.6) 


k=M-+1 


then, for a more engaging challenge, show that for b,c € R and all 
integers 0 < M < N one also has a uniform bound for the quadratic 
exponential sums, 


M 
S Je ((k? + bk +0)/N) | < /2N(1 + log N). (14.7) 
k=1 


LINEAR EXPONENTIAL SUMS AND THEIR ESTIMATES 


For a quick orientation, one should note that the bound (14.6) gener- 
alizes those which were used in the discussion of Abel’s inequality. For 
example, since |Rew| < |w| we can set t = 1/12 in the bound (14.6) to 
obtain an estimate for the cosine sum 


M+N 

1 2/2 
S~ cos(kw/6)| < = ns v2 = 3.8637... 
Marea sin(t/12) 3-1 
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This is remarkably close to the best possible bound (14.3), and the 
phenomenon it suggests is typical. If one must give a uniform estimate 
for a whole ensemble of linear sums, the estimate (14.6) is hard to beat, 
though, of course, it can be quite inefficient for many of the individual 
sums. 

To prove the bound (14.6), one naturally begins with the formula for 
geometric summation, 


M+N 
3 e(kt) = e((M 4 ny {| 


k=M+1 e(t)—1 


and, to bring the sine function into view, one has the factorization 


e(Nt/2) (e(we/2) cs e(—N¢/2)) /2i 
o(t/2) | (et/2) — e(-1/2)) /2% 


If we identify the bracketed fraction and take the absolute value, we find 


S- e(kt) 


M+N | 
k=M+1 


sin(7 Nt) 1 

sin(zt) | — |sinae| 
Finally, to get the second part of the bound (14.6), one only needs to 
notice that the graph of t + sin wt makes it obvious that 2||t|| < | sin zt]. 


AN EXPLORATION OF QUADRATIC EXPONENTIAL SUMS 


The geometric sum formula provided a ready-made plan for estimation 
of the linear sums, but the quadratic exponential sum (14.7) is further 
from our experience. Some experimentation seems appropriate before 
we try to settle on a plan. 

If we consider a generic quadratic polynomial P(k) = ak? + Bk +7 
with a, @,y € R and k € Z, we need to estimate the sum 


Q 
<5 


M 
Su(P) = Sl e(P(k)), (14.8) 


k=1 


or, more precisely, we need to estimate the modulus |S) (P)| or its square 
|Si¢(P)|?. If we try brute force, we will need an n-term analog of the 
familiar formula |c; + co|? = |e1|? + |co|? + 2Re {c1é}, and this calls for 
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us to compute 


M 2 M 
ye Cn} = en Pt Ss {Cmén + Emen} 
n=1 n=1 l<m<n<M 
M 
= S- ent s 2Re {CnEm } 
n=1 1<m<n<M 
M M-1M-h 
=) ie tone DS. eeier (14.9) 
n=1 h=1 m=1 


If we specialize the formula (14.9) by setting c, = e(P(n)), then we 
come to the identity 


|Sir(P)/? = MaKe SY of P(m+h)—P(m))). (14.10) 


This formula may seem complicated, but if one looks past the clutter, 
it suggests an interesting opportunity. The inside sum contains the 
exponentials of differences of a quadratic polynomial, and, since such 
differences are simply linear polynomials, we can estimate the inside 
sum with help from the basic bound (14.6). 

The difference P(m +h) — P(m) = 2amh-+ ah? + Bh brings us to the 
factorization e(P(m + h) — P(m)) = e(ah? + Bh)e(2amh), so for the 
inside sum of the identity (14.10) we have the bound 

M-h 


do e((P(m + h) — P(m))) 


m=1 


1 


= aia (14.11) 


Thus, for any real quadratic P(k) = ak? + 8k +7 we have the estimate 


M-1 
1 


-1 
1 
P)?<M+2 <N4 14.12 
ISu(P)P S 2, | sin(tha)| — » [Jha] ’ ( ) 


where ||ah|| is the distance from ah € R to the nearest integer. 
After setting a =1/N, 6 = b/N, and y =c/N in the estimate (14.12), 
we find a bound for our target sum 


M 2 N-1 i 
e((k?+bk+c)/N)| <N+ 
», (( )/N) > TaN 
<N+2N - (14.13) 
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where in the second step we used the fact that the fraction h/N is closest 
to 0 for 1 <h < N/2 while for N/2 <h< N it is closest to 1. 

The logarithmic factor in the challenge bound (14.7) is no longer so 
mysterious; it is just the result of using the logarithmic bound for the 
harmonic series. Since 1 + 1/2+---+1/m < 1+ logm, we find that 
our estimate (14.13) not larger than N + 2N (1+ log(N/2)) which is 
bounded by 2N(1 + log N) since (3 — 2log 2) < 2. After taking square 
roots, the solution of the second challenge problem is complete. 


THE ROLE OF AUTOCORRELATIONS 


The proof of the quadratic bound (14.7) relied on the general relation 
2 oN N-1 
<Q ilel +2 > 
n=1 h=1 


which one obtains from the identity (14.9). This bound suggests that 
we focus on the autocorrelation sums which may be defined by setting 


N 


Ye 


n=1 


N-h 
~~ cm nen (14.14) 
=1 


N-h 
pn(h) =)" GuinGm for alll <h<N. (14.15) 
m=1 


If these are small on average, then the sum |c; + cg +--+ + cy| should 
also be relatively small. 

Our proof of the quadratic bound (14.7) exploited this principle with 
help from the sharp estimate (14.11) for |pn(h)|, but such quantita- 
tive bounds are often lacking. More commonly we only have qualitative 
information with which we hope to answer qualitative questions. For 
example, if we assume that |c,| < 1 for all k = 1,2,... and assume that 


lim “V~= 0 forallh=1,2,..., (14.16) 


does it follow that |c: + co +---+en|/N — 0 as N — co? The answer 
to this question is yes, but the bound (14.14) cannot help us here. 


LIMITATIONS AND A CHALLENGE 


Although the bound (14.14) is natural and general, it has serious 
limitations. In particular, it requires one to sum |py(h)| over the full 
range 1 < h < N, and consequently its effectiveness is greatly eroded if 
the available estimates for |p) (h)| grow too quickly with h. For example, 
in a case where one has hN!/? < |py(h)| < 2hN1/? the limit conditions 
(14.16) are all satisfied, yet the bound provided by (14.14) is useless 
since it is larger than N?. 
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Such limitations suggest that it could be quite useful to have an analog 
of the bound (14.14) where one only uses the autocorrelations py (h) for 
1<h< H where A is a fixed integer. In 1931, J.G. van der Corput 
provided the world with just such an analog, and it forms the basis 
for our next challenge problem. We actually consider a streamlined 
version of van der Corput’s which underscores the role of py(h), the 
autocorrelation sum defined by formula (14.15). 


Problem 14.3 (A Qualitative van der Corput Inequality) 
Show that for each complex sequence c,,C2,...,cn and for each integer 
1<H<N one has the inequality 


AN N ; H 
a ne2 ea +S owt) \ (14.17) 


=: 


N 


Ye 


n=1 


2 


A QUESTION ANSWERED 

Before we address the proof of the bound (14.17), we should check 
that it does indeed answer the question which was posed on page 213. If 
we assume that for each h = 1,2,..., one has py(h)/N — 0 as N => co 
and if we assume that |c,| < 1 for all k, then the bound (14.17) gives us 


N 2 4 
2a Shee 14.1 
HA SUP a2 De - H+ et) 


Here H is arbitrary, so we do find that |cy + cg +---+cn|/N — 0 as 
N — ov, just as we hoped we would. 

The cost — and the benefit — of van der Corput’s inequality are 
tied to the parameter H. It makes the bound (14.17) more complicated 
than its naive precursor (14.14), but this is the price one pays for added 
flexibility and precision. 


EXPLORATION AND PROOF 


The challenge bound (14.17) does not come with any overt hints for 
its proof, and, until a concrete idea presents itself, almost all one can 
do is explore the algebra of similar expressions. In particular, one might 
try to understand more deeply the relationships between a sequence and 
shifts of itself. 

To discuss such shifts without having to worry about boundary effects, 
it is often useful to take the finite sequence c 1, c2,...,cy and extend it to 
one which is doubly infinite by setting c, = 0 for allk <Oandallk > N. 
If we then consider the sequence along with its shifts, some natural 
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relationships start to become evident. For example, if one considers the 
original sequence and the first two shifts, we get the picture 


and when we sum along the “down-left” diagonals we see that the ex- 
tended sequence satisfies the identity 


In the exactly same way, one can sum along the diagonals of an array 
with H +1 rows to show that the extended sequence satisfies 


N N+H H 
Cp) eee De cai: (14.19) 
n=1 n=1 h=0 


This identity is not deep, but does achieve two aims: it represents a 
generic sum in terms of its shifts and it introduces a free parameter H. 


AN APPLICATION OF CAUCHY’S INEQUALITY 


If we take absolute values and square the sum (14.19), we find 


N 2 N+H H 2 N+H, H 2 
2 S = S S S S 
nm se = -_ ’ 
(H + 1) Cc = Cn—h < { Cn—h \ 
n=1 n=1 h=0 n=1 'h=0 


and this invites us to apply Cauchy’s inequality (and the 1-trick) to find 


2 N+H 


(H +1)? (14.20) 


zal s 


This estimate brings us close to our challenge bound (14.17); we just 
need to bring out the role of the autocorrelation sums. When we expand 
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the absolute values and attend to the algebra, we find 


N+H 


yD 


n=1 


A 


y Cn—h 


h= 


=J 
n=1 \ j=0 k=0 


N+H ; H H 
-y{ \Cn— Bie +2Re ee sCn— iS 


=0 t=s+1 


n=1 s=0 t=s+1 n=1 
N H-1 H N+H 
< (H =“ 1) x Cn ; +2 S- SS S- Cn—sCn—t 
n=1 s=0 t=s4+1!' n=1 
N A N 
=(H+1) Solel? +25 (H+1-A)| So entata 
n=1 h=1 n=1 


This estimate, the Cauchy bound (14.20), and the trivial observation 


that |z| = |Z|, now combine to give us 
2 N N-h 
N+H > 2 alfa Dy Sy _ 
< Tr nr UT |* 
= Sai Aol a (1 wa) epee 


This is precisely the inequality given by van der Corput in 1931. When 
we reintroduce the autocorrelation sums and bound the coefficients in 
the simplest way, we come directly to the inequality (14.17) which was 
suggested by our challenge problem. 


CANCELLATION ON AVERAGE 


Many problems pivot on the distinction between phenomena that take 
place uniformly and phenomena that only take place on average. For 
example, to make good use of Abel’s inequality one needs a uniform 
bound on the partial sums |S;|, 1 < k < n, while van der Corput’s 
inequality can be effective even if we only have a good bound for the 
average value of |p (h)| over the fixed range 1 <h< H. 

It is perhaps most common for problems that have a special role for 
“cancellation on average” to call on integrals rather than sums. To 
illustrate this phenomenon, we first recall that a sequence {y, : k € S} 
of complex-valued square integrable functions on [0,1] is said to be an 
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orthonormal sequence provided that for all 7,k € S one has 


i: ifjH#k 


1 

; (a x) dz = 14.21 

[ weeny ee (14.21) 

The leading example of such a sequence is yp (x) = e(kax) = exp(27ikxr), 

the sequence of complex exponentials which we have already found to 
be at the heart of many cancellation phenomena. 

For any finite set A C S, the orthonormality conditions (14.21) and 


direct expansion lead one to the identity 


[ S- ChPn(@) 


ke A 
Thus, for S;(x) = cipi(x) + cope(a) +--+ + cen (2), the application of 
Schwarz’s inequality gives us 


1 1 5 : 
ff isolaylae < { | |Sn(c)P ac = (loa|?-4 jeal® boos 4 leal8 
0 0 


and, if we assume that |c,| < 1 for all 1 < k < n, then “on average” 
|S,(x)| is not larger than /n. The next challenge problem provides 
us with a bound for the maximal sequence M,,(%) = maxi<k<n |Sx(2)| 
which is almost as good. 


Problem 14.4 (Rademacher—Menchoff Inequality) 
Given that the functions py, : [0,1] = C, 1<k <n, are orthonormal, 
show that the partial sums 


Se (x) = c1yi(a) + copa(x) +--+ + cee (2) l<k<n 


satisfy the maximal inequality 


1 n 
A max, |S (a)| de < loga(4n) > lea. (14.23) 

This is known as the Rademacher—Menchoff inequality, and it is surely 
among the most important results in the theory of orthogonal series. For 
us, much of the charm of the Rademacher—Menchoff inequality rests in 
its proof and, without giving away too much of the story, one may say 
in advance that the proof pivots on an artful application of Cauchy’s 
inequality. Moreover, the proof encourages one to explore some fun- 
damental grouping ideas which have applications in combinatorics, the 
theory of algorithms, and many other fields. 
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POSING A COMBINATORIAL QUESTION 


Our goal is to bound the integral of max;<,<n |S%(x)|?, and our only 
tool is the orthogonality identity (14.22). We need to find some way 
to exploit the full strength of this identity; in particular, we need to 
exploit the fact that it holds for all possible choices of A C {1,2,...,n}. 
This advice is vague, but it still suggests some relevant combinatorial 
questions. 


For example, is there a “reasonably small” collection 6 of subsets of 
{1,2,...,n} such that each of the initial segments 


I, ={1,2,...,k} 1<k<n, 


can be written as a disjoint union of a “reasonably small” number of 
elements of 6? An affirmative answer would suggest that we might get 
a useful bound on the integral of maxi<k<n |:$%(x)|? by using the identity 
(14.22) on each element of B. 

Our experience with binary representations reminds us that integers 
have succinct representations as sums of powers of two, so perhaps we 
should seek an analogous representation for the sets {, :1<k <n}. 
For example, we might try to show that each I, can be written as a 
disjoint union of a small number of blocks with length 2° where s may 
run between 0 and |log, |. 

To translate this suggestion into a formal plan, we first let [a,b] denote 
the interval of integers {a,a+1,...,b}, and we let B denote the set of 
all integer intervals of the form 


[r2° +1, (r+ 1)2°] where 0<r<oo, O<s< ow 


and where [r2° + 1, (r + 1)2°] C [1, n]. 


Now, for any integer & € [1,n], we can easily produce a collection of sets 
C(k) C B such that 
[als bs) 8 (14.24) 
BEC(k) 
but, if we exercise some care, we can also keep a tight control on |C(k)|, 
the cardinality of the collection C(k). 


A GREEDY ALGORITHM 


A natural way to construct the desired collection C(k) of binary in- 
tervals is to use a greedy algorithm. For example, to represent [1,k], we 
first take the largest element B of B that begins with 1, and we remove 
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the elements of B from [1,k]. Except when k is a power of 2, the first 
step leaves us with a nonempty interval of the form [x,k] where x is 
equal to 2° + 1 for some integer s. We then apply the same greedy idea 
to [a, k]. 

On the second step, we find the largest element B in B that begins 
with x, and we remove the elements of B from [x,k]. This time, if the 
remaining set is nonempty, its first element must be of the form r2° + 1 
for some choice of integers r and s. The greedy removal process then 
continues until one gets down to the empty set. 

If we count the number of steps taken by the greedy algorithm we find 
that it is simply the number of Is in the binary expansion of k. Since 
the number of such 1’s is at most [log,(k)], we have a useful cardinality 
bound |C(k)| < [logs(k)] < [loga(n) 

For a quick confirmation of the construction, one might consider the 
interval [27 = [1,27]. In base 2 one writes 27 as 11011, and we find that 
the greedy algorithm provides a representation for [27 as a 4-term union 


[1,27] = {1,2,...,16} U {17,18,...,24} U {25, 26} U {27}. 


SUMS AND AN OPPORTUNITY FOR CAUCHY’S INEQUALITY 


We now need to see how our set representations are related to partial 
sums such as those in our challenge problem. Still, to keep the combi- 
natorial essentials in clear view, we keep y;(x) out of the picture for the 
moment, and we simply focus on partial sums of complex numbers a,, 
1<jsn. 

From our representation of [1,k] as the union of the sets in C(k), we 
have a representation of a generic partial sum, 


a, + ag+-+--+ 4p = oa S 7 aj. 
BeC(k) jEB 

The benefit of this representation is that the index set for each of the 
double sums is reasonably small, so one can apply Cauchy’s inequality 
(and the 1-trick) to the outside sum to find 
2 
Jar + a2 +--+ axl? <|C(k)| SO 

BEC(k) 


D4 


jEB 


(14.25) 


One should now have high hopes of finding a useful estimate for the last 
sum; after all, it is a sum of squares, and we have already studied such 
sums at considerable length. If we prepare for the worst, we have 


|C(K)| < [logs (k)| < floga(m)]| and C(k) CB, 
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so the double sum bound (14.25) gives us 


2 
; (14.26) 


2d 4 


jEB 


Rates lay +a +-+++ax|? < [logs(n)] py 


and this offers several signs of progress. On the left side one finds a 
maximal sequence maxi<k<n |a1 + a2 +--+ + ar|? of the kind we hoped 
to estimate, while on the right side we find a sum of squares which does 
not depend on the index value 1 < k <n. Honest bookkeeping should 
carry us the rest of the way. 


A FINAL ACCOUNTING 


If we simply replace a; by cj;y;(x) in the bound (14.26) and recall our 
notation for the partial sums of the y;(x), 1 <j <n, then we find 


S- C593 (x 


jEB 


2a 
max |Sx(2)|? < [logan ups, 


Now, if we integrate both sides, then we see that the basic orthonormal- 
ity conditions (14.22) tell us that 


1 
a max |Se(x x)|” dx < flog,(n 12 S- le;|?, (14.27) 


BEeBjEB 


which is almost our target inequality. For each j € [1,n] there are at 
most 1+ [log,(n)] sets B € B such that 7 € B, so we see that inequality 
(14.27) gives us the bound 


1 
| max |S;,(x )P dx < flogs(n)](1 + flogs(n p> lcj|?. (14.28) 
0 


1<k<n 


This bound is actually a bit stronger than the one asserted by the 
Rademacher—Menchoff inequality (14.23) since for all n > 1 we have 
the bound [flog,(n)](1 + [logs(n)]) < (2 + logy n)? = log3(4n). 


CANCELLATION AND AGGREGATION 


The Rademacher—Menchoff inequality and van der Corput’s inequal- 
ity provide natural illustrations of the twin themes of cancellation and 
aggregation. They are also two of history’s finest examples of pure 
“Cauchy—Schwarz technique.” They contribute to one’s effectiveness as 
a problem solver, and they provide a fitting end to our class — which is 
not over just yet. Here, as in all the earlier chapters, the exercises are 
at the heart of the matter. 
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EXERCISES 


The first few exercises lean on Abel’s inequality and, among other 
things, they provide an analog for increasing multipliers and an ana- 
log for integrals. To help with the latter, Exercise 14.2 develops the 
slippery “second mean value formula” for integrals. This handy tool is 
also used to obtain the so-called van der Corput’s lemmas — two ele- 
mentary bounds which turn out to be of fundamental help when facing 
cancellation in integrals. 

The next few exercises address diverse aspects of cancellation, includ- 
ing the exploitation of complete exponential sums, the dyadic trick, and 
variations on the Rademacher—Menchoff inequality. Lower bounds for 
complex sums are entertained for the first time in Exercise 14.9, and 
Exercise 14.10 provides our first example of a domination inequality. 

The final exercise develops Selberg’s inequality. At first, it may seem 
to be simply a messy variation on Bessel’s inequality, but the added 
complexity and generality serve a genuine purpose. The applications 
of Selberg’s inequality in combinatorics, number theory, and numerical 
analysis could fill a book, perhaps even a proper sequel to the Cauchy 
Schwarz Master Class. 


Exercise 14.1 (Abel’s Second Inequality) 

Show that for each nondecreasing sequence of nonnegative real num- 
bers 0 < by < bg--- < by, one has a bound which differs slightly from 
Abel’s first inequality, 


1<k<n 


Exercise 14.2 (The Integral Mean Value Formulas) 

The first integral mean value formula (IMVF) asserts that for each 
continuous f : [a,b] — R and each integrable g : [a,b] — [0, 00), there is 
a € € [a,b] such that 


b b 
/ fa)g(a) de = f(@) / g(a) de, (14.30) 


while the second IMVF is the slightly trickier assertion that for each dif- 
ferentiable nonincreasing function ~ : [a,b] — (0,00) and each integrable 
function ¢: [a,b] — R, there is a & € [a,b] such that 


b €0 
i OCC On Or (14.31) 


a 
Prove these formulas. They are both quite handy, and the second one 
may be tricker than you might guess. 
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Exercise 14.3 (A Integral Analog to Abel’s Inequality) 
If f : [a,b] — (0,00) is a nonincreasing function, then for each inte- 
erable function g : [a,b] > R one has the bound 


[1 f(2)g(2) de [soar 


which is the natural integral analog of Abel’s inequality. Prove the 
bound (14.32) and show that it implies 


< f(a) sup (14.32) 


a<y<b 


pos 
; 2 

| ae ir <- for lO <<a<b<o. (14.33) 
ei Sap a 


Exercise 14.4 (van der Corput on Oscillatory Integrals) 

(a) Given a differentiable function 0 : [a,b] — R for which the deriva- 
tive 0’(-) is monotonic and satisfies 6’(x) > v > 0 for all x € [a, b], show 
that one has the bound 

b 
| i e'9(2) dz 


(b) Use the bound (14.34) to show that if 0 : [a,b] — R is a twice 
differentiable function with 6”(”) > p > 0 for all x € [a, 6], then 


b * 
| i e'9(2) dx 


These workhorses lie behind many basic cancellation arguments for in- 
tegrals and sums. They also come to us from the same J. G. van der 
Corput who gave us our third challenge problem. In fact, these may be 
the best known of van der Corput many inequalities, even though they 
are notably less subtle than the bound (14.17). 


= 


4 
7 (14.34) 


8 
< BR (14.35) 


Exercise 14.5 (The “Extend and Conquer” Paradigm) 
First show that for integers m and 7 one has the formula 


= ol Gime ‘° if m does not divide j (14.36) 


4 m if m does divide j. 


This formula tells us that for such a complete sum one either has to- 
tal cancellation, or no cancellation at all. There are many remarkable 
consequences of this elementary observation. 

For example, use it to show that for each prime p > 3, and each pair 
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A and B of subsets of F, = {0,1,2,...,p— 1}, one has 


ye »( amt) < p?|A|?|Bl?. (14.37) 


jEAkEB 


Exercise 14.6 (Another Dyadic Passage) 

Sometimes we have an estimate for f(x) and we would like an estimate 
of g(x), but we cannot show g(x) < f(a). We may still be able to get a 
useful bound on g(x) if we only know that f(x) dominates “half” of g 
in the sense that 


g(x) — g(a/2) < f(a) for all x > 0. 


To be specific, assume such a function is continuous and show that if 
f(x) < Ax+B for x > 0, then g satisfies the (only slightly worse) bound 
g(x) < A’x+ B' logs(x%)+C’ for « > 1 for appropriate constants A’, B’, 
and C’. 


Exercise 14.7 (Rademacher—Menchoff with Weights) 
Let 1, W2,...,Wn be real-valued functions for which 


1 1 
| 3 (a) dzx=1 and f Ww; (x) ve (x) dz = ayn. (14.38) 
0 0 
Show that if there exists a constant C' such that 
Do aiKysye] < CD97 (14.39) 
j=l k=1 jal 
for any n real numbers yj, Y2,.--,Yn, then we also have 


1 k 2 n 
2 2 
| Ee (> cy¥3()) dx < C log3(4n) 2, Ch (14.40) 


for all real c,,c2,..-,Cn- 


Exercise 14.8 (Functions with Geometric Dependence) 


If the constant p satisfies 0 < p < 1 and the sequence of functions 
{w,;} satisfies 


[vate da < pli-# ([ ve ir) 3 (ste e 


for all 1 < 7,k <n, then there is a constant M depending only on p 
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The set S(@) consists of all 

of the elements z; contained 

in the half-plane H(@) given by 
{z:Re ze” > O}. 


Fig. 14.1. To find a subset of S = {z1,z2,...,2n} whose sum has a large 
absolute value, why not first consider the just the subsets S(0) for 6 € (0,27)? 


such that partial sums $;,(a) = w(x”) + wo(a) + +--+ dx(a), 1<k <n, 
satisfy the maximal inequality 


< 
ue max, 55 (2) a) dx < M log3(4n) pais Wa 


Exercise 14.9 (The Subset Lower Bound) 


Show that for complex numbers 21, 22,..., 2, one has 
1 n 
= |< 14.41 
5 Oe Il IC{1,2,...,n} ye : ( ) 
j=l jel 


and show that the constant factor 1/m cannot be replace by a larger one. 
The qualitative message of this cancellation story is that there is always 
some subset with a sum whose modulus is a large fraction of the sum 
of all the moduli. For a hint one might consider the special subset So 
defined in Figure 14.1. 


Exercise 14.10 (A Domination Principle) 


If the complex numbers a,, satisfy the bounds |a,| << An, 1<n< N, 
then the complex array {yr : 1 <n < N,1 <r < R} satisfies the 
bounds 


aps 


r=1 s=1 


2 R 


=O. 


r=1 s=1 


(14.42) 


N 
S- An YnrYns 


n=1 


N 
S- An YnrYns 


n=1 
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Exercise 14.11 (An Inequality of P. Enflo) 
Show that for vectors un, 1 <m< M, and v,,1<n<_ JN, in the 
inner product space C4 one has the bound 


M N 
ss SS | (ums ¥n) |" 


m=l1n=1 


<{ Son ee V4 Sy ial? io 


m=1 p=1 n=l1lv=1 


Exercise 14.12 (Selberg’s Inequality) 
Prove that if x and yi, y2,...,Yn are elements of a real or complex 
inner product space, then we have 


(x leas 
Seal se ; so < (x,x). (14.43) 
Selberg’s inequality can sometimes be used as a replacement for the 
orthonormality identity (14.22) or Bessel’s inequality (4.29) when the 
elements y1,Y2,---,;¥n are only approximately orthogonal. Techniques 
for relaxing the requirements of orthonormality have important conse- 
quences throughout probability, number theory, and combinatorics. 


Solutions to the Exercises 


CHAPTER 1: STARTING WITH CAUCHY 


SOLUTION FOR EXERCISE 1.1. The first inequality follows by applying 
Cauchy’s inequality to {a,} and {b,} where one takes by = 1 for all 
k. In isolation, this “1-trick” is almost trivial, but it is remarkably 
general: every sum can be estimated in this way. The art is rather one 
of anticipating when the resulting estimate might prove to be helpful. 
For the second problem we apply Cauchy’s inequality to the product of 
{a,/ ey and faz! a, This is a simple instance of the “splitting trick” where 
one estimates the sum of the a, by Cauchy’s inequality after writing a, 
as a product az = bycx. Almost every chapter will make some use of the 
splitting trick, and some of these applications are remarkably subtle. 


SOLUTION FOR EXERCISE 1.2. This is another case for the splitting 
trick; one just applies Cauchy’s inequality to the sum 


SOLUTION FOR EXERCISE 1.3. The first inequality just requires two 
applications of Cauchy’s inequality according to the grouping axz(bgcx), 
but one might wander around a bit before hitting on the proof of second 
inequality. 

One key to the proof of the second bound comes from noting that 
when we substitute a, = b, = cy = 1 we get the lackluster bound 
n? <n. This suggests the inequality is not particularly strong, and it 
encourages us to look for a cheap shot. One might then think to deal 
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with the cy, factors by introducing 
gad /(C+a+---+c), 
so the target inequality would follow if we could show 
bpm vt 
So abeenl < (Svat) (Sout) 
k=1 k=1 


but this bound is an immediate consequence of the usual Cauchy in- 
equality and the trivial observation that || < 1. 


SOLUTION FOR EXERCISE 1.4. For part (a) we note by Cauchy’s in- 
equality and the 1-trick that we have 


$< (12412412) cr+y 4 r+2z YZ 6 
G+ytes LC+ytese L+ytzZ 


For part (b) we apply Cauchy’s inequality to the splitting 


x Zz at 
e+y+z= VIF E+ TO et Eth. 


SOLUTION FOR EXERCISE 1.5. From Cauchy’s inequality, the splitting 
Pk = pip! and the identity cos?(x) = {1 + cos(2r)}/2, one finds 


ae 
S 
IA 
M 
3 
M 3 


Pr COS” (/3;,2) 


a =) 
= Sop . + cos(2G,2)) = {1+ g(2x)}/2. 
= 


SOLUTION FOR EXERCISE 1.6. We first expand the sum 


n 


S "(pn + 1/pe)? = 2n+ Sp? + S-1/p2, (14.44) 
k=1 k=1 k=1 

and then we estimate the last two terms separately. By the 1-trick and 

the hypothesis pj + po +-::-+ pn = 1, the first of these two sums is at 

least 1/n. To estimate the last sum in (14.44), we first apply Cauchy’s 

inequality to the sum of the products 1 = \/px - (1/,/pe) to get 


n 
n? < S- 1/pr, 
k=1 
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and to complete the proof we apply Cauchy’s inequality to the sum of 
the products 1/p, = 1-1/p,x to get 


There are several other solutions to this problem, but this one does an 
especially nice job of illustrating how much can be achieved with just 
Cauchy’s inequality and the 1-trick. 


<u 


SOLUTION FOR EXERCISE 1.7. The natural candidate for the inner 
product is given by (x,y) = 5aiy1 + v1y2 + L2y1 + 3y3 where one has 
set xX = (#1,%2) and y = (yi,y2). All of the required inner product 
properties are immediate, except perhaps for the first two. For these we 
just need to note that the polynomial 5z?+3z+3 = 0 has no real roots. 

More generally, if ajx, 1 < j,k <n, is a square array of real numbers 
that is symmetric in the sense that aj, = a,; for alll < j,k < n, then 
the sum 


a So Se ace (14.45) 


j=l k=1 


provides a candidate for inner products on R”. The candidate (14.45) 
yields a legitimate inner product on R” if (a) the polynomial defined by 
Q(#1,%2,---,2n) = Diy k=1 GjeXj7e is nonnegative for all vectors 
(%1,22,...,2n) € R” and if (b) Q(a#1, 22,...,¢%n) = 0 only when x; = 0 
for all 1 < 7 <n. A polynomial with these two properties is called a 
positive definite quadratic form, and each such form provides us with 
potentially useful of Cauchy’s inequality. 


SOLUTION FOR EXERCISE 1.8. In each case, one applies Cauchy’s in- 
equality, and then estimates the resulting sum. In part (a) one uses the 


sum for a geometric progression: 1+ a? + 24+ a°+---=1/(1— 27), 
while for part (b), one can use Euler’s famous formula 


lo-e) 1 12 
pte = OMe... <2 
k=1 


or, alternatively, one can use the nice telescoping argument, 


1 it 7 1 1 1 
Vest tpn tt h(a t) =2 n 
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2n 
d 
<f o = log2. 
ne 


Finally, for part (d) one uses the explicit sum for the squares of the 
binomial coefficients 


HC) EO) = (2) 


which one can prove by a classic counting argument. Specifically, one 


For part (c) one has the integral comparison 


1 tees dx 7 
aah 2p a ee 


+k—-1 4 


considers the number of ways to form a committee of n people from a 
group of nm men and n women. The middle sum first counts the number 
of committees with / men and then sums over 0 < k < n, while the last 
term directly counts the number of ways to choose n people out of 2n. 


SOLUTION FOR EXERCISE 1.9. If J’ denotes the left-hand side of the 
target inequality, then by expansion one gets 


ra2yd +4 S- ajar, 
(G,k)ES 


where S' is the set of all (j,&) such that 1 <j <k <n with j +k even. 
From the elementary bound 2a;a, < as + az, one then finds 


rs2yg+2 S- (Geb <2 +2 mah 


(,k)ES 


where n, denotes the number of pairs (j,k) in S with 7 = s ork = s. 
One has ng < |(n — 1)/2], so 


T < (2+2|(n—1)/2]) So aj < (n+2) Sai. 


j=l j=l 


SOLUTION FOR EXERCISE 1.10. If we apply Cauchy’s inequality to the 
splitting legal 2 lary ||eyx1? [el we find 


-(L{Lreal}or) (SL {Lrenl}me) 


j=l \k=1 k=1 \ j=l 


and the sums in the braces are bounded by C' and R respectively. 
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SOLUTION FOR EXERCISE 1.11. Only a few alterations are needed in 
Schwarz’s original proof (page 11), but the visual impression does shift. 
First, we apply the hypothesis and the definition of p(t) to find 


0 < p(t) = (v, v) + 2t(v, w) + t?(w, w). 


The discriminant of p(t) is D = B? — AC = (v,w)? — (v,v)(w,w), 
and we deduce that D < 0, or else p(t) would have two real roots (and 
therefore p(t) would be strictly negative for some value of t). 
SOLUTION FOR EXERCISE 1.12. We define a new inner product space 
(Vi"l[-,-]) by setting VI] = {(v1,ve,...,vn) 2 v7 EV, 1 <j <n} 
and by defining [v, w] = a1 (Xi ¥5) where v = (v1, V2,---;Vn) and 
where w = (Wj, W2,---,Wn). After checking that [-,-] is an honest inner 
product, one sees that the bound (1.24) is just the Cauchy—Schwarz 
inequality for the inner product |[-,-]. 


SOLUTION FOR EXERCISE 1.13. If we view {aj,:1<j<m,1<k<n} 
as a vector of length mn then Cauchy’s inequality and the one-trick 
splitting xj, = x;,-1 imply the general bound 


yes < eee (14.46) 


We apply this bound to xj, = ajx — r;/n — cx/m where 


n m m n 
r= 5 Oks Ch = s ajz, andifweset T= y y Ajks 
k=1 j=l j 


then the left side of the bound (14.46) works out to be T?, and the right 
side works out to be 


m n m n 

2 2 2 2 

mn 5 y az, —m) re—ny c, + 27”, 
j=l k=1 


j=l k=1 


so the Cauchy bound (14.46) reduces to our target inequality. 

To characterize the case of equality, we note that equality holds in the 
bound (14.46) if and only if x;;, is equal to a constant c in which case one 
can take aj =c+r; and 8, = c, to provide the required representation 
for aj. This result is Theorem 1 of van Dam (1998) where one also finds 
a proof which uses matrix theory as well as some instructive corollaries. 


SOLUTION FOR EXERCISE 1.14. More often than one might like to 
admit, tidiness is important in problem solving, and here the hygienic 


Solutions to the Exercises 231 


use of parentheses can make the difference between success and failure. 
One just carefully computes 


E theh= DO bf Debeh} 


1<t,j,k<n 1<i,k<n 


II IA 
co 
Il M | 
aN 
—— ij 
iM 
3 
Gi oe 
a ee 
SiMe i 
dle 
iMe S 
i] 
eae — 
—“——s 3 
Q. 
iM: “'s 
rr) cod 
S 7 
“ 4 i] 
i] 
NV 


{2 {EE Ey) 


ly 


fe} etzay 


1st,j<n 1<j,ken 1sk,i<n 


This proof of the triple product bound (1.25) follows Tiskin (2002). 
Incidentally, the corollary (1.26) was posed as a problem on the 33rd 
International Mathematical Olympiad (Moscow, 1992). More recently, 
Hammer and Shen (2002) note that the corollary may be obtained as an 
application of Kolmogorov complexity. George (1984, p. 243) outlines a 
proof of continuous Loomis—Whitney inequality, a result which can be 
used to give a third proof of the discrete bound (1.26). 


SOLUTION FOR EXERCISE 1.15. If we differentiate the identities (1.27) 
and (1.28) we find for all 6 € © that 


S~ po(k;8)=0 and S~ g(k)pe(k;0) = 1. 


keD keED 


Consequently, we have the identity 


= Do (9(k) — 4)po (ks 8) 
=X {oH — 0)p(4; 0)? } { (Dold 8) /p(de 8)) pC; 6)? b 
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which yields the Cramér—Rao inequality (1.29) when we apply Cauchy’s 
inequality to this sum of bracketed terms. 

The derivation of the Cramér—Rao inequality may be the most signifi- 
cant application of the 1-trick in all of applied mathematics. It has been 
repeated in hundreds of papers and books. 


CHAPTER 2: THE AM-GM INEQUALITY 


SOLUTION FOR EXERCISE 2.1. For the general step, consider the sum 
S41 = aybg + adgbg +--+ + Agk+1boK41 = Shay + Shay where Shay is 
the sum of the first 2* products and S//,, is the sum of the second 2* 
products. By induction, apply the 2*-version of Cauchy’s inequality to 
Si, and Sy, to get S,,, < A’B’ and S,,,, < A” B” where we set 
1 1 

Al = (aj +---+a3,)2, Av = (@3e4,+° -++45,4,)?, and where we define 
B’ and B” analogously. The 2-version of Cauchy’s inequality implies 


Guay A Bi AR < (AA) (B? Be, 


and this is the 2**+1-version of Cauchy’s inequality. Thus, induction gives 
us Cauchy’s inequality for all 2", k = 1,2,.... Finally, to get Cauchy’s 
inequality for n < 2” we just set a;=b; =Oforn<jg< 2* and apply 
the 2-version. 


SOLUTION FOR EXERCISE 2.2. To prove the bound (2.23) by induction, 
first note that the case n = 1 is trivial. Next, take the bound for general 
n and multiply it by 1+ 2 to get 1+ (n+l)at+a? <(1+2)""!. This 
is stronger than the bound (2.23) in the case n+ 1, so the bound (2.23) 
holds for all n = 1,2,... by induction. To show 1+ 2 < e”, one replaces 
x by x/n in Bernoulli’s inequality and lets n go to infinity. Finally, to 
prove the relation (2.25), one sets f(x) = (1+)? — (1+ px) then notes 
that f(0) = 0, f’(x) > 0 for > 0, and f’(#) < 0 for -1 < x < 0, so 
MINz€[—1,00) f(x) = f(0) = 0. 


SOLUTION FOR EXERCISE 2.3. To prove the bound (2.26) one takes 
pi =a/(a+ 8), po = B/(a+ B), a, = 2°, and az = y*t? and applies 
the AM-GM bound (2.7). To get the timely bound we specialize (2.26) 
twice, once with a = 2004 and @ = 1 and once with a = 1 and @ = 2004. 
We then sum the two resulting bounds. 


SOLUTION FOR EXERCISE 2.4. The target inequality is equivalent to 
abe + ab?c+ abc? < a+ +b4+4, a pure power bound. By the AM-GM 
inequality, we have a2be = (a°)?/3(b?)1/3(c3)/3._ < 2a3/3 + b3/3 + 3/3, 
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and analogous bounds hold for ab?c and abc?. The sum of these bounds 
yields the target inequality. 

Equality holds in the target inequality if and only equality holds for 
both of our applications of the AM-GM bound. Thus, equality holds 
in the target bound if and only if a = b= c. Incidentally, three other 
solutions of this problem are available on website of the Canadian Math- 
ematical Association. 


SOLUTION FOR EXERCISE 2.5. For all j and k, the AM-GM inequality 
gives us (aithyith)2 < Lasyk + xkyl). Setting k = n—1—j and 
summing over 0 < j <n yields the bound 

xz” —y” 


(xy) P-D/2 < gP1 4 gly 4... bgyt 2 4 yt! = —t 


SOLUTION FOR EXERCISE 2.6. Since a+ = 7 we have y = a and 6 = 2 
so the triangles A(ABD) and A(DBC) are similar. By proportionality 
of the corresponding sides we have h : a = b: h, and we find h? = ab, 
just as required. 


SOLUTION FOR EXERCISE 2.7. The product (1+)(1+y)(1+z) expands 
aslt+a+y+z2+ay+uz+yz+4+ ryz and the AM-GM bound gives us 


(e+ty+z2)/3>ayz>1 and 


(vy +22 + y2)/3 > {(ey)(o2)(ye)} = (ey2)*?* 21, 


so the bound (2.28) follows by summing. With persistence, the same 
idea can be used to show that for all nonnegative ax, 1 < k < n, one 
has the inference 


1<][a = 2"<][G+a). (14.47) 


SOLUTION FOR EXERCISE 2.8. The AM-GM inequality tells us 


A,X, + A9%Q +++: + Aner 
1/n 141 242 nen 
fayx102%2°**Antn}l/™ < = ; 


and this yields a relation between the critical quantities of P, and Py, 


2 (a, 1 + Gg%q +++: + GyXp)” 


wee eae ayia Ann” 
109°-* On 


We have equality here if and only if ajxz1 = adg%q = +++ = GnXp, and 
nothing more is needed to confirm the stated optimality criterion. 
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SOLUTION FOR EXERCISE 2.9. By the AM-GM inequality, one has 
2ab + 2ac + 2bc 
3 
and this gives the bound (2.9). Finally, equality holds here if and only 
if ab = ac = bc. This is possible if and only if a = b =, so the box of 

maximum volume for a given surface area is indeed the cube. 


2{a7b2c?}4/3 = {(2ab) (2ac)(2be)} 1/3 < = A/3, 


SOLUTION FOR EXERCISE 2.10. If we set p = n and y = x«—1 in 
Bernoulli’s inequality, we find that y(n—y"~+) < n—1 and equality holds 
only for y = 1. If we now choose y such that y"~! = a,/@ where @ = 
(a, +a2+-+-+an)/n, then we have n—y"~! = (ay +ag4+:--+an—1)/G, and 
easy arithmetic takes one the rest of the way to the recursion formula. 

As a sidebar, one should note that the recursion also follows from 
the weighted AM-GM inequality 2/"y"—D/" < tx + noly by taking 
L = Gy and y = (a, +02 + +--+ an_-1)/(n— 1). 


SOLUTION FOR EXERCISE 2.11. Following the hint, one finds from the 
AM-GM inequality that 


(a1a2+++a,)/" + (bybg +++ by)t/™ 
{(ay + by) (a2 + bg) +++ (Qn + by)} 


j=l j=l 
ieee i oD 
Se) ny ee Bra ee 
n «= a; +b; n «= a; +b; 


and the proof is complete. The division device is decisive here, and as 
the introduction to the exercise suggests, this is not an isolated instance. 


SOLUTION FOR EXERCISE 2.12 As Figure 2.4 suggests, we have the 
bound f(x) = x/e*~! < 1 for all x > 0. In fact, we used this bound long 
ago (page 24); it was the key to Pélya’s proof of the AM-GM inequality. 
If we now write cy, = ax/A, then we have cy; + co +++: +cCn = n, and 
from this fact we see that for each k we have 


n n nm 

c;—-1 l-ec cCh—1 
[la=« II Ci SK II es" = cre’ * = cy/e** = f (cr). 
j=l DIFK DIAK 


Since « = (A—G)/A and cy, = a;/A we have for all k = 1,2,...,n that 


Q102°**An ap/A 


CSG A” ~ exp(az/A — 1) 


= f(ax/A). 
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Now the bounds (2.33) are immediate from the definition of p_, p4, 
together with the fact that f is strictly increasing on [0,1) and strictly 
decreasing on (1, 00). 

This solution was given by Gabor Szeg6é in 1914 in response to a 
question posed by George Polya. It is among the earliest of their many 
joint efforts; at the time, Szeg6 was just 19. 


SOLUTION FOR EXERCISE 2.13. In general one has |w| > |Rew] and 
Re (w+ z) = Re(w) + Re(z), so from Re z; = p; cos 0; we find 


lzy +2o +--+ 2,| > [Re(z1 + 22 +--++ 2n)| 
= |z1| cos 6, + |z| cos Ag +--+ + |z,| cos On 
> (z1| + [zo] +--+ + |2n|) cos 
1 
> n(lal |zal «+ |enl)'/” cos x, 


where we first used the fact that cosine is monotone decreasing on [0, 7/2] 
and then we applied the AM-GM inequality to the nonnegative real 
numbers |z;|, 7 = 1,2,...,n. This exercise is based on Wilf (1963). 
Mitrinovié (1970) notes that versions of this bound may be traced back 
at least to Petrovitch (1917). There are also informative generalizations 
given by Diaz and Metcalf (1966). 


SOLUTION FOR EXERCISE 2.14. Take x > 0 and y > 0 and consider the 
hypothesis H(n), ((a + y)/2)” < (a” + y")/2. To prove H(n +1) we 
note by H(n) that 


wty\" _ (ety) (2ty\” — (ety) arty" 
2 rahe 2 = 2 2 


_ ntl yrtt xy” | yx” 
4 
7 grt ae. yrtt (x = y)(x” = y”) e grt ze yntt 
2 4 > 2 


Induction then confirms the validity of H(n) for all n > 1. 
Now by H(n) applied twice we find 


(oe ea ee 1 
4 = 2; 2 2 
<5({ae8 , eeetl 
5D, 2 2 
LL + eg + 234+ 2] 
1 , 
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and this argument can be repeated to show that for each k and each set 
of 2* nonnegative real numbers #1, %2,...,%» we have 


eee uy, n n eee nr 
{es =e e Ly + 2X5 + Teo (14.48) 


Qk Qk 


Cauchy’s trick of padding a sequence of length m with extra terms to 

get a sequence of length 2" now runs into difficulty, so a new twist is 

needed. One idea that works is to use a full backwards induction. 
Specifically, we now let Hnew(m) denote the hypothesis that 


fate tutte) geet (14.49) 
m m 
for any set of m nonnegative real numbers 21, %2,...,%m. We already 


know that Hnew(m) is valid when m is any power of two, so to prove 
that Hnew(m) is valid for all m = 1,2,... we just need to show that for 
m > 2, the hypothesis Hpew(m) implies Hyew(m — 1). 

Given m — 1 nonnegative reals S = {21,22,...,%m_—1i}, we introduce 
a new variable y by setting y = (v1 + 2 +-+-+2m-~-1)/(m— 1). Since 
y is equal to (a1 + 2 +--+ + X%m_—1 + y)/m, we see that when we apply 
H(m) to the m-element set SU {y}, we obtain the bound 


aptagt:: +an_ity" 


m 


nr 


and, when we clear y” to the left side, we find 


Ce eee 


m—-1 


n 


This inequality is precisely what one needed to establish the validity of 
Hnew(m — 1), so the solution the problem is complete. This solution is 
guided by the one given by Shklarsky, Chentzov, and Yaglom (1993, pp. 
391-392). 


CHAPTER 3: LAGRANGE’S IDENTITY AND MINKOWSKI’S CONJECTURE 


SOLUTION FOR EXERCISE 3.1. From the four geometric tautologies, 


ay i a2 
cos 9 = ————— sn Q = —— 
2 2° 2 2? 
Vay + a5 Vay + a9 

by —be 


sin G = 


~ e+ be Je + be 
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and the two trigonometric identities, 
a,b; + agbe 


2 2 2 2° 
by + b3\V/ajz t+ a3 


cos(a + 3) = cosacos 8 — sinasin 3 = 


agby — aybe 


2 2 2 27 
by + bp y/ajy + a3 


sin(a + 8) = sinacos 2+ cosasin B = 


we find the Pythagorean path to the identity of Diophantus: 


(a,b, + azb2)? + (a,b = azb,)? 
(bj + b3)(az + 43) 


1 = cos*(a + 8) + sin?(a + 8) = 


SOLUTION FOR EXERCISE 3.2 Here we just prove the identity of Dio- 
phantus since Brahmagupta’s identity is analogous. As expected, one 
first factors. What is amusing is how one then recombines twice: 


(x} + x3)(y? + yg) = (w1 — ix2)(a1 + ix2)(y1 — ty2) (yr + tye) 
= =A (x1 — ive) (yr + tye) {( (a1 + ix2)(y1 — iye) }. 


The first factor is { (x1 Yi txoy2)+i(x1y2— xoyi)} and the second factor 
is its conjugate {(xiyi + £2Yy2) — Ux1Y2 xoyi)} so these have product 


(x1y1 + ray2)? + (a1 y2 — X2y1)”, a computation which reveals the power 
of the factorization a? + b? = (a+ ib)(a — ib) in a most remarkable way. 


SOLUTION FOR EXERCISE 3.3. On can pass from the discrete identity 
to a continuous version by appealing to the definition of the Riemann 
integral as a limit of sums, but it is both easier and more informative to 
consider the anti-symmetric form s(z,y) = f(x)g(y) — g(x) f(y) and to 
integrate s?(a, y) over the square [a, b]?. In this way one finds 


Ef f {rea - Flapgte)\ dey 
=f Ped f eea—-{ [ reogmay, (14.50) 


provided that that all of the indicated integrals are well defined. In- 
cidentally, anti-symmetric forms often merit exploration. Surprisingly 
often, they lead us to useful algebraic relations. 


SOLUTION FOR EXERCISE 3.4. The two sides of the proposed inequality 
can be written respectively as 


n n n 2 
A= {ea Dm +A - 2) ats} 
1 k=l j=l 
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and as 
n 2 n n 2. n 
B={2(Soa) +a-) @l{e(om) +a-y al 
j=l j=l j=l j=l 
from which one finds that B— A can be written as the sum of the term 


ra-f > (6, Som-a dom) } and the term 


j=l 


(1-2) {oe Le- Yas ub 
j=) 


The first term is a sum of squares and the second term is nonnegative 
by Cauchy’s inequality. Thus, B — A is the sum of two nonnegative 
terms, and the solution is complete. The inequality of the problem is 
from Wagner (1965) and the solution is from Flor (1965). 


SOLUTION FOR EXERCISE 3.5. Since f is nonnegative and nondecreas- 
ing one has the integral inequality 


0< | fi f(a) F(y)(y — 2) (F(a) — f(y) andy 


since the integrand is nonnegative. One may now complete the proof 
by simple expansion. Incidentally, this way of exploiting monotonicity 
is exceptionally rich, and several variations on this theme are explored 
at length in Chapter 5. 

SOLUTION FOR EXERCISE 3.6. One expands and then factors 


n 


n n 
Dr+i - Dn = ) ajbj; + Nan+ibn41 — bn41 y Qj — An4+1 y b; 
j=l j=1 


j= 


a3(bj — bn41) + anti >> (bn41 — 85) 
j=l 


(Qn41 = a;)(bn+1 = b;) > 0. 


According to Mitrinovié (1970, p. 206) this elegant observation is due 
to R.R. Jani¢. The interaction between order relations and quadratic 
inequalities is developed more extensively in Chapter 5. 


SOLUTION FOR EXERCISE 3.7. In the suggested shorthand, Lagrange’s 
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identity can be written as 


(a,a)(b, b) — (a,b)? = © 


and if we fix b and polarize a with s we find 


aj bj} |8; by 


Sk On| 


(a,s)(b, b) — (a,b)(s,b) = 5° 


j<k 


Qk by 


Now, if we fix a and s and polarize b with t we find 


ay dj|\s; tj 


(a,s)(b, t) — (a,t)(s,b) = > 


j<k 


ar OK] Se te 


which is the shorthand version of the target identity. 


SOLUTION FOR EXERCISE 3.8 

After expanding the two products, on sees that the difference of the 
left-hand side and the right-hand side of Milne’s inequality (3.17) can 
be written as a symmetric sum 


 (agbs)(ai + di) — (aibi) (az + 05) 
2. (1) +b (a; +55) (ai + bi) ) 


1<i<j<n 


When each summand is put over the denominator (a; + 0;)(a; + 6;), the 
numerator may be simplified, and one finds that this difference coincides 
with the definition (3.16) of R. 


CHAPTER 4 ON GEOMETRY AND SUMS OF SQUARES 


SOLUTION FOR EXERCISE 4.1. Each case follows by an application of 
the triangle inequality to an appropriate sum. Those sums are: 


(a) (wt+ytz,e+y+2)) =(z,y) + (y,2) + (z,2) 
(b) (y,z) = (a@,2)+(y-—a2,z-2) and 
(c) (2,2,2) < (a@+1/a,y+1/y,z+1/z) = (a, y,2z) + (1/2, 1/y,1/z). 


SOLUTION FOR EXERCISE 4.2. The derivative on the left is equal 
to (Vf(x),u) which is bounded by ||Vf(x)|||lul]| = ||Vf(x)|| by the 
Cauchy—Schwarz inequality. On the other hand, the derivative on the 
right is equal to (Vf(x),v) = ||Vf(x)]] by direct calculation and the 
definition of v. These observations yield the inequality (4.21). 

We have equality in the application of the Cauchy—Schwarz inequality 
only if u and V f(x) are proportional, so the bound (4.21) reduces to an 
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equality if and only if u = AV f(x). Since u is a unit vector, this implies 
= +1/||Vf(x)||. Only the positive sign can give equality in the bound 
(4.21), and in that case we have u = v. 


SOLUTION FOR EXERCISE 4.3. Direct expansion proves the representa- 
tion (4.22). To minimize P(t) we solve P’(t) = 2t(w, w) — 2(v,w) =0 
and find P(t) > P(t 9) where tp = (v,w)/(w,w). The evaluation of 
P(t) then leads one to the expression (4.22). 


SOLUTION FOR EXERCISE 4.4. This exercise provides a reminder that 
one sometimes needs a more elaborate algebraic identity to deal with 
the absolute values of complex numbers than to deal with the absolute 
values of real numbers. Here the key is to use the Cauchy—Binet four 
letter identity (3.7) on page 49. The proof of that identity was purely 
algebraic (no absolute values or complex conjugates were used) so the 
identity is also valid for complex numbers. One then just makes the 
replacements ap +> Gx, by > bE, Sk > ap, and tp > by. 


SOLUTION FOR EXERCISE 4.5. This observation of $.S. Dragomir (2000) 
shows how the principles behind Lagrange’s identity continue to bear 
fruit. Here one just takes the natural double sum and expands: 


1 
O< 5 Pia AjXk — aKx; |) 


II 
Nie 
We 
(“Js 

= 

= 


[a |[xx ||? — 2(ajxK, anx;) + af ||; [|?] 


j=l k=1 
n n 
= pipes ||Xall? — S> S- pyprajon(Xn, X;) 

j=l k=l j=l k=l 

n 

2 

=> p05 3 mallet? }jXj 
j=l 


This identity gives us our target bound (4.24) and shows that the in- 
equality is strict unless a;x~, = a,x; for all 7 and k. Finally, one should 
also note that a corresponding inequality for a complex inner product 
spaces can obtained by a similar calculation. 


SOLUTION FOR EXERCISE 4.6. 

There are proofs of this inequality that use only the tools of plane 
geometry, but there is also an exceptionally interesting proof that uses 
the transformation z ++ 1/z for complex numbers. There is no loss of 
generality in setting A = 0, B = z1,C = za, and D = 23, and the triangle 
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inequality then gives us 


1 1 


Z1 23, 


1 1 


Zl 7) 


< + 


23 23 


which may be rewritten as |z2||z1 — z3| < |z3l|z1 — z2| + |zi||z2 — 23]. 
After identifying these terms with help from Figure 4.7, we see that it 
is precisely Ptolemy’s inequality! 

To prove the converse, we first note that one has equality in this ap- 
plication of the triangle inequality if and only if the points z, x Zo, ae Za : 
are on line. One then obtains the required characterization by appealing 
to the fact that z +> 1/z takes a circle through the origin to a line and 
vice versa. 

The transformation z +> 1/z is perhaps the leading example of a 
Mobius transformation, which more generally are the maps of the form 
zt (az+b)/(cz+d). Every book on complex variables examines these 
transformations, but the treatment of Needham (1997), pages 122-188, 
is especially attractive. Needham also discusses Ptolemy’s result with 
the help of inversion, but the quick treatment given here is closer to that 
of Treibergs (2002). 


SOLUTION FOR EXERCISE 4.7. To prove the identity (4.26), expand the 
inner product squares and use 1+a+--:-+aN~1 = (1—a)/(1—a) = 0. 
For the second identity, just expand and integrate. This exercise is based 
on D’Angelo (2002, pp. 53-55) where one finds related material. 


SOLUTION FOR EXERCISE 4.8. The first part of the recursion (4.28) 
gives us (Z,,e;) = 0 for all 1 < j < k, and this gives us (ex,e;) = 0 for 
alll <j < k. The normalization (e,,e,) = 1 for 1 < k < nis immediate 
from the second part of the recursion (4.28), and the triangular spanning 
relations just rewrite the first part of the recursion (4.28). 


SOLUTION FOR EXERCISE 4.9. Without loss of generality may we as- 
sume that ||x|| = 1. The Gram—Schmidt relations are then given by 
x =e, and y = pe. + peg. Orthonormality gives us (x,y) = ju and 
(y,y) = |r|? + |uel?, and the bound |ji1| < (|ji|? + |u2|2)2 is obvious. 
But this says |(x,y)| < (y,y)? which is the Cauchy—Schwarz inequality 
when ||x|| = 1. 


SOLUTION FOR EXERCISE 4.10. From the Gram—Schmidt process ap- 
plied to {y1,y2,---;¥n,x} one finds e; = yj, €2 = ya,...,€n = Yn and 
e€ny1 = Z/||z|| where z = x — ((x,e1)e, + (x, e2)e2 +--+: + (x, en)en), 
provided that z #£ 0. Taking inner products and using orthonormality 
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then gives us 


n+1 n 
(x,x) = S7 (x, e;)/? = [x enti)? + Yo ley), 
j=l j=l 


and since |(x,@p41)|? gives us Bessel’s inequality when z # 0. When 
z = 0 one finds that Bessel’s inequality is in fact an identity. 


SOLUTION FOR EXERCISE 4.11. Without loss of generality we can as- 
sume that x, y, and z are linearly independent and ||x|| = 1, so the 
Gram-Schmidt relations can be written as x = e), y = pye2 + p2e2, 
and z = 1,e; + 12e2 + v3e3, from which we find (x,x) = 1, (x,y) = 
i, (x,Z) = ™ and (y,z) = pi + pev2. The bound (4.30) asserts 
ba S 5 (parr + pave + (ui + 13) (V2 + vZ +.¥3)2) or pas — pave < 
(2 + 3)? (v2 +2 +v2)2, which is immediate from Cauchy’s inequality. 
SOLUTION FOR EXERCISE 4.12. With the normalization and notation 
used in the solution of Exercise 4.11, the left side L of the bound (4.31) 
can be written as 


(x, x)(y,z) — (x,y) (x, 2)| = [{(wi + 2%) — wf? = |w2v9/, 
and the right side R can be written as 
{(x, x)? — |(x,y) |? } {(«, x)? — |(x, 2)|?} 
= (1 = |p|?) = ral?) = |wel? (val? + |v31?), 


since we have 1 = ||y|| = |u|? + |v2|? and 1 = ||z|| = |r|? + |v2|? + |v3]?. 
These formulas for L and R make it evident that L < R. 
Now, to prove the bound (4.32) it similarly reduces to showing 


|prP1 + pixDo| + |pa|? + |r|? 
< 14+ (fi + fiove) iy + (pi + pee) fia 
and, by expansion, this is the same as 
Jua|? + orl? + [urei|? + [wove|? + 2Re {p14 figv2} 
<1+ |u|" + 2Re {ti fiov2}. 
After cancelling terms, we see it suffices for us to show 


L= |pa|? + rl? + |uave|? < 1+ |oir)?, 


but the substitution |ugve|? = (1 — |py|?)(1 — |r|? — |v3|7) gives us 
L=1+ |u|? + |v3|?((uil? — 1) < 1+ |uim|? since |u|? < 1. This 
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exercise is based on Problems 16.50 and 16.51 of Hewitt and Stomberg 
(1969, p. 254). 


SOLUTION FOR EXERCISE 4.13. Following the hint, we first note 
A? v||? = (Atv, Atv) = (v, AAT) < ||v{|||447 vl] = IIvIIIA7 vl, 


so by division ||A?v|| < ||v||. Next, by the Cauchy—Schwarz inequality 
and the properties of A and A’ we have the chain 


llv, vll? = (Av, Av) = (v, AT Av) < I[vII||A7 Av < IIvIIAvIl = Iv, v1, 


so we actually have equality where the first inequality is written. This 
tells us that there is a \ (which possibly depends on v) for which we 
have Av = A’ Av. This relation in turn gives us 


\Mv,v) = (v, AT Av) = (Av, Av) = (v,v), 


so in fact A = 1 (and hence it does not actually depend on v)). We 
therefore find that v = A’ Av for all v, so A7 A = I as claimed. This 
argument follows Sigillito (1968). 


CHAPTER 5: CONSEQUENCES OF ORDER 


SOLUTION FOR EXERCISE 5.1. The upper bound of (5.17) follows from 


hy thot:::+hy = —b1 + —bo4+-4 br 


h 
< {b) tbo +--+ + bn} max —, 
k br 


and the lower bound is analogous. For application, if we set ay = c,v* 


and by, = cry”, then we have min a,/by = (a/y)” and max az/b;, = 1. 


SOLUTION FOR EXERCISE 5.2. The n — 1 elements of S have mean A, 
so by the induction hypothesis H(n — 1) we have 


a2a3°° “An (ay + ag — A) < Ant. 


The betweenness bound already gave us aja,/A < a, + a2 — A, and, 
when we may apply this bound above, we get H(n) which completes the 
induction. 

This proof from Chong (1975) is closely related to a “smoothing” proof 
of the AM-GM which exploits the algorithm: 


(i) if a@1,a2,...,@, are not all equal to the mean A, let a; and a, 
denote the smallest and largest, respectively, 
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(ii) replace a; by A and replace ay by a; + ax — A, 

(iii) note that each step of the algorithm increases by one the number 
of terms equal to the mean, so the algorithm terminates in at 
most n steps. 


The betweenness bound gives us aja, < A(a; + a, — A) so each step 
of the algorithm increases the geometric mean of the current sequence. 
Since we start with the sequence aj,d2,...,@, and terminate with a 
sequence of n copies of A, we see aa2:++dn < A”. 


SOLUTION FOR EXERCISE 5.3. If one first considers V = R and sets 
a=uand b=v then the inequality in question asserts that 


AB — ab > (A? — a?)2(B? — b*)?. (14.51) 
By expansion and factorization, this is equivalent to 
(aB — Ab)? > 0, 


so the bound (14.51) is true and equality holds if and only if aB = Ab. 
To address the general problem, we first note by the Cauchy—Schwarz 
inequality 


ve Wee (u, v) 2 AB — (u,u)2(v, v)?, 
so, by the bound (14.51) with a = (u,u)? and b = (v,v)2, one has 
AB — (u,v) > (A? — (u,u))?(B? = (v,v))?, (14.52) 


which was to be proved. If equality hold in the bound (14.52), this argu- 
ment shows that we have (u,v) = (u,u)2(v,v)?, so there is a constant 
X such that u = Av. By substitution one then finds that \ = A/B. 

The bound (14.52) is abstracted from an integral version given in 
Theorem 9 of Lyusternik (1966) which Lyusternik used in his proof of 
the Brunn—Minkowski inequality in two dimensions. The idea viewing 
V =R as a special inner product space is often useful, but seldom is 
it as decisive as it proved to be here. One should also notice the easily 
overlooked fact that the bound (14.52) is actually equivalent to the light 
cone inequality (4.15). 


SOLUTION FOR EXERCISE 5.4. This problem does not come with an 
order relation, but we can give ourselves one if we note that by the 
symmetry of the bound we can assume that 0 < « < y < z. We then 
get for free the positivity of the first summand «°(a# — y)(x — z), so to 
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complete the proof we just need to show the positivity of the sum of the 
other two. This follows from the factorization 


y*(y — £)(y — 2) + 2%(z—a)(z—y) = (z-y){2*(z- 2) —y*(y— @)} 
and the observation that z > y and z-a>y-«2. 
This proof illustrates one of the most general methods at our disposal; 


the positivity of a sum can often be proved by creatively grouping the 
summands so that the positivity of each group becomes obvious. 


SOLUTION FOR EXERCISE 5.5. This is one of the text’s few “plug-in” 
exercises, but the bound is so nice it had to be made explicit. We just 


note that m a/A < ap/bp < A/b = M, then we substitute into the 
formulas (5.6) and (5.7). 


SOLUTION FOR EXERCISE 5.6. Without loss of generality, we can as- 
sume that 0 << a<b<c, and, under this assumption, we also have 
1 1 1 
< < 


b+e7~ ate” atb 


< 4 
b+e ate a+b” b+e a. aD 


and that 


c a, 5b Gig US. ns 3 


be atc a+b b+ce' atc atbd 
By summing these two bounds we find Nesbitt’s inequality. 

Engel (1998, pp. 162-168) provides five instructive proofs of Nesbitt’s 
inequality, including the one given here, but, even so, one can add to 
the list. Tony Cai recently noted that Nesbitt’s inequality follows from 
the bound (1.21), page 13, provided that one sets 


_ a _ b _ Cc 
re b oe p27 b e ee b C 
a+b+e a+b+e a+b+e 
ay = ———,,__ ag = ———_.,_ 3 = ——_., 
b+e ms ate ‘ a+b 


and sets by, = 1/a, for k = 1,2,3. With these substitutions the bound 
(1.21) automatically gives us 


a b c (a+b+c)? — (a? +b? +c?) 
Ls + + ; 
b+e ate a+b (a+b+c)? 


which in turn yields Nesbitt’s inequality since the second factor is bounded 
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by 2/3 because Cauchy’s inequality for (a,b,c) and (1,1,1) tells us that 
(a+b+c)? < 3(a27 +b? +c’). 


SOLUTION FOR EXERCISE 5.7. Since the sequences {c; } and {1/c;,} are 
oppositely ordered, the rearrangement inequality (5.12) tells us that for 
any permutation o one has n < ¢1/¢9(1) + €2/€o(2) +*** + €n/Co(n), and 
part (a) is a special case of this observation. If we set cy = 11%2--+ Xp 
in part (a) we get part (b), and if we then replace x, by px,, we get 
part (c). Finally, by setting p = (4122---x,)~" and simplifying, we get 
the AM-GM bound. 


SOLUTION FOR EXERCISE 5.8. 

The inequality is unaffected ifm, M, and z;, 1 < 7 < n are multiplied 
by a positive constant, so we can assume without loss of generality that 
+y = 1, in which case, we have M = m™', and it suffices to show that 


{Swe }{ Dash <i (14.53) 


where 2u = m+M=m+m~. 
1<j <n, so we have 


n n 
1 
wy +257 < m+m! <2y and { pits} + {Sn=} < 2p, 
j=l j=l J 


and these yield the bound (14.53) after one applies the AM-GM in- 
equality to the two bracketed terms. There are many instructive proofs 
of Kantorovich’s inequality; this elegant approach via the AM-GM in- 
equality is due to Ptdék (1995). 


Now, one has z; € [m,m™'] for all 


SOLUTION FOR EXERCISE 5.9. One elegant way to make the monotonic- 
ity of fo evident is to set cj = (a;b;)° and d; = log(a;/b;) to obtain 


fo(a) = S- Cj edit S- Cj et = 5 le +2 x cj ce cosh (d; — dy.) x 
j=l 


j=l j=l j<k 


where cosh y = (e¥ + e-¥)/2. Since cosh y is symmetric about zero and 
monotone on [0,00), the monotonicity of fg(-) is now immediate. This 
solution follows Steiger (1969) where a second proof based on Holder’s 
inequality is also given. 


SOLUTION FOR EXERCISE 5.10. We can assume without loss of gen- 
erality that ay > a2, 6b; > be, and a, > b,. Remaining mindful of the 
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relation a,+az = b, +b, the proof can be completed by the factorization 


a1,,a2 b1, be by 


a2,,41 be 
cry rays —aey* — Key 
= go ye (g21—22 an yu =e gt 42 yb2—a2 - g2— 22 4,01—a2) 


= got y® (a-51—22 _ y?t— 92) (g.2—a2 _ ye? 2) > 0, 


since by — ag > bo — ag = ay — b; > 0. Lee (2002) notes that the bound 
(5.22) may be used to prove analogous inequalities with three or more 
variables. Chapter 13 will developed such inequalities by other methods. 


SOLUTION FOR EXERCISE 5.11. Let A denote the event that |Z — p| 
is at least as large as 4. Now, define a random variable y4 by setting 
xa = 1 if the event A occurs and setting x4 = 0 otherwise. Note that 
E(xa) = P(A) = P(|Z — p| > A). Also note that ya < |Z — p|?/d?, 
since both sides are zero if A does not occur, and the right side is at least 
as large as 1 if the event A does occur. On taking the expectation of 
the last bound one gets Chebyshev’s tail bound (5.23). Admittedly, the 
language used in this problem and its solution are special to probability 
theory, but nevertheless the argument is completely rigorous. 


CHAPTER 6: CONVEXITY — THE THIRD PILLAR 


SOLUTION FOR EXERCISE 6.1. Cancelling 1/a from both sides and 
adding the fractions, one sees that Mengoli’s inequality is equivalent to 
the trivial bound x? > «?—1. For a proof using Jensen’s inequality, just 
note that «+> 1/z is convex. Finally, for a modern version of Mengoli’s 
proof that H,, diverges, we assume H,, < co and write H as 


PCI VBA a Es B67) Se 1/94 1/10) 6 ov 


Now, by applying Mengoli’s inequality within the indicated groups we 
find the lower bound 1 + 3/3+ 3/64 3/9+---=1+ Hoo, which yields 
the contradictions H,, > 1+ Ho. 

By the way, according to Havil (2003, p. 38) it was Mengoli who in 
1650 first posed the corresponding problem of determining the value 
of the sum 1 + 1/2? + 1/32 +---. The problem resisted the efforts of 
Europe’s finest mathematicians until 1731 when L. Euler determined the 
value to be 17/6. 


SOLUTION FOR EXERCISE 6.2. The bound follows by applying Jensen’s 
inequality to the function f(t) = log(1+1/t) = log(1+#) —log(t), which 
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is convex because 


1 1 
“¢) =-——_— + = for t , 
f'(t) G+ +e? ort > 0 


SOLUTION FOR EXERCISE 6.3. From the geometry of Figure 6.4, the 


area A of an inscribed polygon with n sides can be written as 


1 n nm 
By Die where 0<60,<7 and Zeon 


Since sin(-) is strictly concave on [0,7], we have 


1— i 1 1 - 
A= 5 S sin(6x) < grain (- 0) — gr sin(2n/n) ae A’, 
k=1 k=1 
and we have equality if and only if 6, = 27/n for alll <k <n. Since 
A’ is the area of a regular inscribed n-gon, the conjectured optimality is 
confirmed. 


SOLUTION FOR EXERCISE 6.4. The second bound is the AM-GM in- 
equality for a4, = 1+ rp, k = 1,2,...,n. The first bound follows from 
Jensen’s inequality applied to the convex function x + log(1 + e”). Fi- 
nally, by taking nth roots and subtracting 1, we see that the investment 
inequality (6.23) refines the AM-GM bound rg < ra by slipping V1/" —1 
between the two means. 


SOLUTION FOR EXERCISE 6.5. To build a proof with Jensen’s inequality, 
we first divide by (a,a2---a,)!/" and write c, for b,/a,, so the target 
inequality takes the form 


1+ (c1eg +++ en)” < {(1 aoe) @ eae seca + en) yl. 


Now, if we take logs and write c; as exp(d;), we find it takes the form 


log (1 + exp(d)) < ~ 3 log(1 + exp(d;)), 


j=1 


where d = (d, + dz +---+d,)/n. Finally, the last inequality is simply 
Jensen’s inequality for the convex function + log(1 + e”), so the 
solution is complete. One feature of this solution worth noting is that 
progress came quickly after division reduced the number of variables 
from 2n to n. This phenomenon is actually rather common, and such 
reductions are almost always worth a try. 
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Here it is perhaps worth noting that Minkowski’s proof used yet an- 
other idea. Specifically, he built his proof on analysis of the polynomial 
p(t) = [] (a; + tb;). Can you recover his proof? 


SOLUTION FOR EXERCISE 6.6. Essentially no change is needed in 
Cauchy’s argument (page 20). First, for the cases n = 2*, k = 1,2,..., 
one just applies the defining relation (6.25) to successive halves. For the 
fall-back step, one chooses k such that n < 2" and applies the 2” result 
to the padded sequence yj, 1 < j < 2* which one defines by taking 
yj; = «; for 1 < j < n and by taking y; = (#1 + 2 +--+ + an)/n for 
n<ijs2*. 


SOLUTION FOR EXERCISE 6.7. As we noted in the preceding solution, 
iteration of the defining condition (6.24) gives us for all k = 1,2,... that 


ie be 
gs) <x Lie. 
j=l j=l 
so setting x; = x for 1 < j < mand a; = y for m < j < 2* we also have 


f((ra/2*)xe + (1 = m/2*)y) < (m/24)f(w) + (1 = m/2*) f(y). 


If we now choose m, and k; such that m;,/2** — p as t — oo, then 
continuity of f and the preceding bound give us convexity of the kind 
required by the modern definition (6.1). 


SOLUTION FOR EXERCISE 6.8. The function L(z, y, z) is convex in each 
of its three variables separately and, by the argument detailed below, 
this implies that LZ must attain its maximum at one of the vertices of 
the cube. After eight easy evaluations we find that L(1,0,0) = 2 and 
that no other corner has a larger value, so the solution is complete. 

It is also easy to show that if a function on the cube is convex in each 
variable separately then the function must attain its maximum on one 
of the corner points. In essence one argues by induction but, for the 
cube in R3, one may as well give all of the steps. 

First, one notes that a convex function on [0,1] must take its max- 
imum at one of the end points of the interval, so, for any fixed values 
of y and z, we have the bound L(x, y,z) < max{L(0,y, z), £(1, y, z)}. 
Similarly, by convexity of y+> L(0,y,z) and y+ L(1,y, z) so L(0, y, z) 
is bounded by max{L(0,0, z), £(0,1,z)} and L(1,y,z) is bounded by 
max{L(1,0,z), L(1,1,z)}. All together, we have for each value of z that 
L(x,y,z) is bounded by max{L(0,0, z), £(0,1, z), L(,0, z), £(1, 1, z)}. 
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Convexity of z +> L(a,y,z) applied four times then gives us the final 
bound L(a, y, z) < max{L(e1, e2,e3) : e, = 0 or ex = 1 for k = 1,2, 3}. 

One should note that this argument does not show that one can find 
the maximum by the “greedy algorithm” that performs three successive 
maximums. In fact, the greedy algorithm can fail miserably here, as 
easy examples show. 


SOLUTION FOR EXERCISE 6.9. To prove the first formula, we note 
a? = b? + c* — 2becos a = (b— c)? + 2be(1 — cosa) 

= (b—c)? + 4A(1 — cosa)/sina = (b—c)? + 4A tan(a/2), 

so, by symmetry and summing, we see that a? + b? + c? is equal to 


(a—b)? + (b= ¢)? + (e— a)? 4 4A(tan(a/2) + tan(3/2) + tan(y/2)). 


Since x + tan is convex on [0, 7/2], Jensen’s inequality gives us 


) = tan(7/6) 


and tan(7/6) = V3, so this completes the proof. Engel (1998, p. 173) 
gives this as the eighth among his eleven amusing proofs of Weitzenbock’s 
inequality and its refinements. 


: {tan(a/2) + tan(@/2) + tan(y/2)} > tan (ea 


SOLUTION FOR EXERCISE 6.10. The polynomial Q(x) can be written 
as a sum of three simple quadratics: 


(a — x2)(x — p) L—21)(% — pb L-21)\(4-— 2 
aE) ogg eg, 
(x1 — ©2)(x1 — p1) (a2 — ©1)(X2 — p4) (uw — 21)(m — X2) 

By two applications of Rolle’s theorem we see that Q’(x) — f’(a) has 
a zero in (x1, j) and a zero in (f1,22), so a third application of Rolle’s 


theorem shows there is an x* between these zeros for which we have 
0 = Q"(a*) — f" (a*). We therefore have Q”(x*) = f!(a*) > 0, but 


2f (21) me 2f (x2) 2f(H) 
(1 —a2)(t1 —p) © (@2—a1)(@2—p)  (u— 21) (4 — 22) 


Q" (a*) = 


so, by setting p = (2 — w)/(a2 — 41) and q = (uw — x) /(x2 — x1) and 
simplifying, one finds that the last inequality reduces to the definition 
of the convexity of f. 


SOLUTION FOR EXERCISE 6.11. Given the hint, we obviously want 
to consider the change of variables, a = tan~!(a), 6 = tan~+(b), and 
7 = tan~1(c). The conditions a > 0, b > 0, c > 0, and a+b+c = abe now 
tell us that a > 0,3 >0,y>0,anda+@+7=7. The target inequality 
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also becomes cos a + cos 6 + cosy < 3/2, and this follows directly from 
Jensen’s inequality in view of the concavity of cosine on [0,7] and the 
evaluation cos(z/3) = 1/2. This solution follows Andreescu and Feng 
(2000, p. 86). Hojoo Lee has given another solution which exploits the 
homogenization trick which we discuss in Chapter 12 (page 189). 


SOLUTION FOR EXERCISE 6.12. If we write 
P(z) = an(z — 171)" (2 — 2)" + (2 =) 


where 11,72,...,Tk are the distinct roots of P(z), and m1,m2,...,mz 
are the corresponding multiplicities, then comparison of P’(z) and P(z) 
gives us the familiar formula 

P'(z) m4 mg Mk 


= ! feed ; 
Plz) z-1T z-912 Z—Tn 


Now, if zo is a root of P’(z) which is also a root of P(z), then zo is 
automatically in H, so without loss of generality, we may assume that 
zo is a root of P’(z) that is not a root of P(z), in which case we find 


0 — ou. + m2 + asa la Mr 
20 —T1 20 — T2 20 —Tk 
_ mi(Zo—71) | mM2(% — Fa) Mx(Zo — Fr) 
veep RMO TTR) 
|Zo — 7? |20 — ral? |zo — Tr? 


If we set wy = Mp/|20 — rz|?, then we can rewrite this identity as 


W111 + W272 +++ + WREIk 
Wy, +Wo+::: + UWE 


20 = 


which shows zo is a convex combination of the roots of P(z). 


SOLUTION FOR EXERCISE 6.13. Write r1,71r2,...,7 for the roots of P 
repeated according to their multiplicity, and for a z which is outside of 
the convex hull H write z —r,; in polar form z— rj; = pyc. We then 
have 


1 4 : 
= pj_te7 9 l<j<n, 
z—T5 


and the spread in the arguments 6;, 1 < 7 < n, is not more than 2y). 
Thus, by the complex AM-GM inequality (2.35) one has the bound 


n 


1 
DD aes 


j=1 


1 1 1 


1/n 1 
. < 
Z—-T12Z—T2 Z—-Tn 


(cos ) 


n 
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and, in terms of P and P’, this simply says 


(2) 


1/n 1 Pp’ 
~ | P(z) 


Ps) for all z ¢ H, (14.54) 


just as we hoped to prove. 


SOLUTION FOR EXERCISE 6.14. If 2¢ is the viewing angle determined by 
U when viewed from z ¢ U, then we have 1 = |z|sinw, so Pythagoras’s 
theorem tells us that cos ~ = (1 — |z|~2)2. The target inequality (6.27) 
then follows directly from Wilf’s bound (6.26). 


SOLUTION FOR EXERCISE 6.15. This is American Mathematical Monthly 
Problem E10940 posed by Y. Nievergelt. We consider the solution by 
A. Nakhash. The disk Do = {z : |1 — z| < 1} in polar coordinates is 
{re? :0 <r < 2cos0, —m/2 < 0 < 7/2}, so for each j we can write 
1+ z,; as rje) where —7/2 < 0 < 7/2 and where r; < 2cos6;. It is 
immediate that z9 = —1 + (rira-::rn)/” exp(i(1 + 02 +++ + On)/n) 
solves Nievergelt’s equation (6.28), and to prove that z) € D it suffices 
to show 1+ zp € Do; equivalently, we need to show 


weet) 


n 


(rir2+++ Tn)" <2cos ( (14.55) 


Since (rpr2 +++ Tn)!/ is bounded by ((2cos 61)(2c0s 02) +++ (2.c08 On))/”, 
it therefore suffices to show that 


((cos 6;)(cos 82) -- - (cos 6,))1/” Eee € a ‘s)) | 


n 


and this follows the concavity of f(x) = log(cosx)) on —7/2 << 0< 7 
together with Jensen’s inequality. 


SOLUTION FOR EXERCISE 6.16. A nice solution using Jensen’s inequal- 
ity for f(a) = 1/a was given by Robert Israel in the sci.math newsgroup 
in 1999. If we set S = a, + ag + a3 + ay and let C' denotes the sum on 
the right hand side of the bound (6.29), then Jensen’s with p; = a;/S 
and 21 = a2 +43, ©2 = a3 + a4, 3 = a4+ ay, and x4 = a, + a2 gives us 
C/S > {D/S}~* or C > S?/D, where one has set 


D=ay(ag a3) a2(a3 + a4) + a3(a4 + a1) + a4(ai + a2). 


Now, it is easy to check that S? — 2D = (a, — a3)” + (a2 — Gis)” > 0, 
and this lucky fact suffices to complete the solution. 
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SOLUTION FOR EXERCISE 6.17. By interpolation and convexity one has 


af at 8) fe) <= + FH) 
so, after subtracting f(a), we find 
F() — f(@) < =— (1) - F(a}. (14.56) 


This gives us the second inequality of (6.30), and the second is proved 
in the same way. 


SOLUTION FOR EXERCISE 6.18. Let g(h) = {f(a +h) — f(x)}/h and 
check from the Three Chord Lemma that for 0 < hy, < hg one has 
g(hi) < g(h2). Next choose y with a < y < x and use the Three Chord 
Lemma to check that —oo < {f(«) — f(y)}/{#—-y} < g(h) for all h > 0. 
The monotonicity and boundedness g(h) guarantee that g(h) has finite 
limit as h — 0. This gives us the first half of the problem, and the 
second half almost identical. 


SOLUTION FOR EXERCISE 6.19. This is just more handy work of the 
Three Chord Lemma which gives us for 0 < s and0 <t withy—seéeTl 
and y +t € I that {f(y) — fly —s)}/s < {f(y +8) — f(y)}/t. From 
Exercise 6.18 we have that finite limits as s,t — 0, and these limits are 
f(y) and f! (y) respectively. This gives us f’ (y) < f(y) and the other 
bounds are no harder. Incidentally, the bound f! (y) < fi.(y) may be 
regarded as an “infinitesimal” version of the Three Chord Lemma. 

Fora<a<s<t<y< band M =max{|f‘(z)|,|fi(y)|} the bound 
(6.31) gives us | f(t) — f(s)| < M|t— s|, which is more than we need to 
say that f is continuous. 


CHAPTER 7: INTEGRAL INTERMEZZO 


SOLUTION FOR EXERCISE 7.1. The substitution gives us 


2f(a)fyg(w)ay) < f(a) 9? (y) + fF? (y)9? (2), 


so integration over [a,b] x [a, 6] yields 
b b 
2 f Hegle) ax f Flu)a(u) ay 
b b b b 
<f Pea f eyatf Pwd | Pear, 


which we recognize to be Schwarz inequality once it is rewritten with 
only a single dummy variable. 
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This derivation was suggested by Claude Dellacherie who also notes 
that the continuous version of Lagrange’s identity (14.50) follows by a 
similar calculation provided that one begins with (u—v)? = u?+v?—2uv. 


SOLUTION FOR EXERCISE 7.2. Setting D(f,g) = A(fg) — A(f)A(g) we 
have the identity 


=f {i@)- au (f)} w3 (x) {g(x) — A(g)} w? (x) da, 


and Schwarz’s inequality gives D?(f,g) < D(f,f)D(g,g) which is our 
target bound. 


SOLUTION FOR EXERCISE 7.3. We first note that without loss of gener- 
ality we can assume that both of the integrals on the right of Heisenberg’s 
inequality are finite, or else there is nothing to prove. The inequality 
(7.11) of Problem 7.3 then tells us that f?(x2) = o(x) as |z| — 00, so 
starting with the general integration by parts formula 


[ P@u- ‘ 


“Pa f af (a) f(a) de, 


we can let A, B — oo to deduce that 
CO 2 lo) co 
/ | f(a) | de = -2 f f(a) f'(x)dx < 2 | |x f(x)| |f"(a)| de. 
Schwarz’s inequality now finishes the job. 


SOLUTION FOR EXERCISE 7.4. One applies Jensen’s inequality (7.19) 
to the integrals in turn: 


+1 da 1 oe dy 1 
> 7 and T > , 
b Cg bees ew a bh Sag © eb are I 


SOLUTION FOR EXERCISE 7.5. By differentiation under the integral sign 


we have 
1 
=| [ steosistyas] < [ stas= 


To be complete, one should note that differentiation under the inte- 
gral sign is legitimate since for f(t) = cos(st) once can check that the 
difference quotients (f(t +h) — f(x))/h are uniformly bounded for all 
0<s<landO0<h<1l. 


sint 
oa | 
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SOLUTION FOR EXERCISE 7.6. By the pattern of Problem 7.4, we find 


B 
d 
(B- A)? <cB?logB | —, 
A f(z) 

so setting A = 2/ and B = 2)*! one finds 

gitt n grt 

d 1 1 

wom: HO aaa so < | ot. 
Ac(j +1)log2 ~ Jor — f(a) Aclog2 <j +1 1 f(x) 


The conclusion then follows by the divergence of the harmonic series. 


SOLUTION FOR EXERCISE 7.7. If we set 6 = f(t)/|f’()| then the trian- 
gle T determined by the points (t, f(t)), (¢,0), and (t,t + 6) lies below 
the graph of f, so the integral in the bound (7.26) is at least as large as 
the area of T which is 3 f?(t)/|f’(t)].- 


SOLUTION FOR EXERCISE 7.8. Since 0 < sint < 1 on [0,7/2] we can 
slip sint inside the integral to get a smaller one. Thus, we have 


ieee 


m/2 1 
(oe 1 + cost)" si cat = | 1+u)"du= 
> | (1+ cost)” sin ml u)” du ae 


Similarly, one has u/x > 1 on [x,0o), so we have the bound 


ax x 


Es =| ue-#/? dy = b¢-2/?, 


In each case one slips in a factor to make life easy. Factors that are 
bounded between 0 and 1 help us find lower bounds, and factors that 
are always at least 1 help us find upper bounds. 


SOLUTION FOR EXERCISE 7.9. In order to argue by contradiction, we 
assume without loss of generality that there is a sequence x, — oo such 
that f’(a,) > € > 0. Now, by Littlewood’s Figure 7.2 (or the by triangle 
lower bound of Exercise 7.7), we note that 


Lnto 
i 1 
Fen +8) = Fler) =f fat > 5e/B 
where B = sup|f”(x)| < oo and 6 = ¢/B. This bound implies that 
f(a) 4 o(1), so we have our desired contradiction. 


SOLUTION FOR EXERCISE 7.10. Differentiation suffices to confirm that 
on (0,1) the map t + t!logt is decreasing and t +> (1+ t7')logt is 
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increasing so we have the bounds 


dt a 
log(1 + t) ae (l-2a)a~ “log a 


1-2 1-2 
= ——“(1+ 27!) log(1 x Dpeo= 
Tae +a~°)log(1+ 2x) < 2log ica 


To show 2 log 2 cannot be replaced by a smaller constant, note that 


1 1 
lim —— | log(1+t) a log 2 


ao1ll—2z e t 


since | log 2 — log(1 + t)/t| < € for all x with |1 — z| < 6(e). 


Solutions to the Exercises 257 


SOLUTION FOR EXERCISE 7.11. If W(a) is the integral of w on [a, x], 
then W(a) = 0, W(b) = 1, and W'(z) = w(x), so we have 


b 1 
/ {log W (a) }w(ax) dx = | logudv = -1. 
a 0 
We then have the relations 
b b 
exp | {log f(x) }w(a) dx = cexp | {log f(x)W (x) }w(a) da 


b b 
< cexp | {log f(x) }w(ax) dx < | f(x)w(a2) dx 


where we used first the fact that 0 < W(x) < 1 for all x € [a,)] and 
then we applied Jensen’s inequality. 


SOLUTION FOR EXERCISE 7.12. Setting I to the integrals of f we have 


1 1 
[ Ue) - 1 de = (AI) - 0) ff (A= Fe) (Fl@) ~ a) ae 
0 
<(A—Ip)Up - @), 
and an analogous inequality holds for g. Schwarz’s inequality then gives 


2 


| i POs, 


= | | GOGO thar 


< / (f(a) — I))? de [ (gw) — Ig)? de 
< (A—Ij)(Iy — 0)(B — Ty) (Uy - 8) 


if gl 2 
7(4- 0)? (B- 6), 


IA 


where in the last step we used the fact that (U—.«x)(x—L) < +(U-L)? 
for all LD < x < U. Finally, to see that Griiss’s inequality is sharp, set 
f(x) = 1 for 0 < x < 1/2, set f(x) = 0 for 1/2 < x < 1, and set 
g(x) =1— f(x) for allO<a2<1. 


CHAPTER 8: THE CONTINUUM OF MEANS 


SOLUTION FOR EXERCISE 8.1. Part (a) follows immediately from the 
Harmonic Arithmetic inequality for equal weights applied to the 3-vector 
(1/(y+z),1/(«+z),1/(a+y). For part (b), one fist notes by Chebyshev’s 
order inequality that 1/3{2?/(y+z)+y?/(x+z)+a?(x+y)} is bounded 
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below by the product 


ee ages ea aay) 


To complete the proof, one then applies the power mean inequality (with 
s = 1 and t = p:p) to lower bound the first factor, and one uses part (a) 
to lower bound the second factor. 


SOLUTION FOR EXERCISE 8.2. By the upside-down HM-AM inequality 
(8.16) one has 
n? 1 1 1 
< cca wep 

Q, + @g+:+:++4n ay a2 An 
If we set a, = 2S — az, then ay + a2 +--+ + an = 2nS—S = (2n-1)S, 
and the HM-AM bound yields 

n? z 1 7m 1 1 

(Qn—-1)S ~ 25—a, 225-2 25 — ay 


SOLUTION FOR EXERCISE 8.3. Both sides of the bound (8.29) are ho- 


mogeneous of order one in (aj, d2,...,@n), SO we can assume without 
loss of generality that a + a,!° 4+...4+9i/3 =1. Given this, we only 
1/2, 1/2 1/2 


need to show aj'~ + a,/~ +++++ a7/~ <1, and this is remarkably easy. 
By the normalization, we have ax, < 1 for all 1 < k < n, so we also 
have atl? < ay! for all 1 < k < n, and we just take the sum to get 
our target bound. One might want to reflect on what made this exercise 
so much easier than the proof of the power mean inequality (8.10). For 
part (b), if we take f(a) = 2° to minimize arithmetic, then we see that 


the putative bound (8.30) falsely asserts 1/16 < 1/27. 


SOLUTION FOR EXERCISE 8.4. We only need to consider p € [a,b], and 
in that case we can write 


=i Pee 
F(p) = max {# ade 
a b 
a \ a b—p| _b-a 
atb\ a Jf at+b\ b f° a+b 
tells us that (b—a)/(a+)) is a weighted mean of (p—a)/a and (b—p)/b, 
so we always have the bound 


The identity 


S@ ps fiz 
Po) = mac {?=2,%>P} 2 8 
a b 
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Moreover, we have strict inequality here unless (p — a)/a = (b — p)/b, 
so, as Pélya (1950) observed, the unique minimum of F'(p) is attained 
at p* = 2ab/(a+ b), which is the harmonic mean of a and b. 


SOLUTION FOR EXERCISE 8.5. For all x € D we have the bound 
(aya2°° “thigh = (01110222 °++ Ann) ees = AknLk (14.57) 


by the AM-GM inequality, and we have equality here if and only if a,x, 
does not depend on k. If we take x, = a;,/(a,a2--+a,)!/", then x € D 
and the equality holds in the bound (14.57). This is all one needs to 
justify the identity (8.33). 
Now, to prove the the bound (2.31) on page 34, one now just notes 
min - 3 ap2_, + min A 2 bp xy < min A Er + by) xp 
xeD n ve xeDn es ~ xeDn es ; 
since two choices are better than one. Incidentally, this type of argument 
is exploited systematically in Beckenbach and Bellman (1965) where the 
formula (8.33) is called the quasilinear representation of the geometric 
mean. 


SOLUTION FOR EXERCISE 8.6. The half-angle formula for sine gives 


sine 2 sin(x/2) cos(a/2) sceetall yy SP 
x x /2 
= cos(x/2) con(e/4){ “CM | 


ay f Sin(a/2*) 
= cos(x”/2) cos(a/4)---cos(a/2 {ae . 
and as k — oo the bracketed term goes to 1 since sint = t + O(t?) 
as t — 0. Upon setting x = 7/2, one gets the second formula after 
computing the successive values of cosine with help from its half-angle 
formula. Naor (1998, pp. 139-143) gives a full discussion of Viete’s 
formula, including a fascinating geometric proof. 


SOLUTION FOR EXERCISE 8.7. Our assumptions give us the bound 
(f(to + h) — f(to))/h = 0 for all h € (0, A], and now we just let h — 0 
to prove the first claim. To address the second claim, one first notes by 
the power mean inequality, or by Jensen’s inequality, that one has 


n n t 
t) = > peat, - (Suen) >0 forall t€ [1, 00). 
k=1 k=1 
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Since 0 = f(1) < f(t) for 1 < t, we also have f’(1) > 0, and this is 
precisely the bound (8.35). 


SOLUTION FOR EXERCISE 8.8. We argue by contradiction, and we begin 
by assuming that (ai4, @2k,---,@nk) does not converge to the constant 
limit @# = (u,p,...,u). For each 7, the sequence {aj, : k = 1,2,...} 
is bounded so we can find a subsequence k,, s = 1,2,..., such that 
(Q1k,;@2k,5+-++5Qnk,) converges to 7 = (11,12,...,Y%m) with 7 # ji. Let- 
ting s > co and applying hypotheses (i) and (ii), we find 


PAU ag, ona Ue Mg Tee 
n n 


Pp. 


Now, by Problem 8.1 we see from these two identities and the case of 
equality in the power mean inequality that imply v; = p for all 7, but 
this contradicts our assumption 7 4 ji, so the proof is complete. 

Niven and Zuckerman (1951) consider only p = 2, and in this case 
Knuth (1968, p. 135) notes that one can give a very easy proof by con- 
sidering the sum )7(a;, — 4)”. The benefit of the subsequence argument 
is that it works for all @? with p > 1, and, more generally, it reminds 
us that there many situations where the characterization of the case of 
equality can be used to prove a limit theorem. 

Subsequence arguments often yield a qualitative stability result while 
assuming little more than the ability to identify the case where equality 
holds. When more is known, specialized arguments may yield more 
powerful quantitative stability results; here the two leading examples 
are perhaps the stability result for the AM-GM inequality (page 35) and 
the stability result for Hélder’s inequality (page 144). 


SOLUTION FOR EXERCISE 8.9. First notes that the hypothesis yields 


the telescoping relationship, 


n—k 
Say — 4) = (Gp tGn-1 +--+ + En—K41) — (41+ 29+-+-+2%) < 2k, 
i=1 


so the inverted HM-AM inequality (8.16) gives us the informative bound 
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Now, by summation we have 


n—-ln-k 
1<j<kén “3 — ba j= Lith — Bi 
n-1 
(n 7 Ne n? 1 1 
oo a cs SOL & 5 MTom, 
7 k=1 2k 2 ooo - 2n 


so the bound H,-; = 1+ $ feet st > ih dx/x = logn completes 
the first part. 
For the second part, we note that for any permutation o one has 


2 SS aa 


vi — 2 
1<j<k<n 5 1<k<n j=l o(9)| 
k-1 
1 


<(n-1 —________ 
= (n eee » |Z (x) 


— £o(5)| 


This argument of Erdés (1961, p. 237) speaks volumes about the rich 
possibilities of simple averages. 


CHAPTER 9: HOLDER’S INEQUALITY 


SOLUTION FOR EXERCISE 9.1. For the second bound one applies Hoélder’s 
inequality with p = 5/4 and q = 5 and finishes with the telescoping 
identity 1/(1-2) + 1/(2-3)+---+ 1/{n(n+1)} =1-1/(n+1). For 
the second bound one uses p = 3/4 and q = 4 and finishes with Eu- 
ler’s classic sum 1 + 1/2? + 1/3? +--- = 77/6. While for the third 
bound one uses p = 3/2 and gq = 3 and finishes with the geometric sum 
l+a%4+a°4---=1/(1—2°%). 


SOLUTION FOR EXERCISE 9.2. Consider z such that |z| > 1 and note 
by Holder’s inequality that one has the bound 


n-1 

S ee) 
Azz 

n=0 


1/q 


n-1 
em An i") , so we also have 
n=0 


n-1 1/q 
n 1 ; 
|P(z)| > |z| (: - Ay( 2 aaa) ); and by summation 


> a o J)4a 53, iq a 
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Thus, we have |P(z)| > 0 if Ap/(|z|/?—1)'/4 < 1. That is, we have 
|P(z)| > 0 for all z such that |z| > (1 + Ag)1/4. 

The bound (9.29) for the inclusion radius is due to M. Kuniyeda, and 
it provides a useful reminder how one can benefit from the flexibility 
afforded by Holder’s inequality. Here, for a given polynomial, a wise 
choice of the power p sometimes leads to an inclusion radius that is 
dramatically smaller than the one given simply by taking p = 2. This 
result and many other bounds for the inclusion radius are developed in 
Mignotte and Stefanescu (1999). 


SOLUTION FOR EXERCISE 9.3. This method is worth understanding, but 
the exercise does not leave much to do. First apply Cauchy’s inequality 
to the sum of aj; where a; = aj;bj;c;d; and 8; = e; f;g;h;, then repeat 
the natural splitting twice more. It is obvious (but easy to overlook!) 
that each p € [1,oo) can be approximated as closely as we like by a 
rational number of the form p = 2"/j where 1 < j < 2". 


SOLUTION FOR EXERCISE 9.4. Apply the Hélder inequality given by 
Exercise 9.7 with D = [0,00) and w(x) = ¢(x) with the natural choices 
f(a) = 20-0, g(x) = 2%, p = 1/(1 — a) and q = 1/a. One conse- 
quence of this bound is that if the tth moment is infinite, then either 
toth or tjth moment must be infinite. 


SOLUTION FOR EXERCISE 9.5. Equality in the bound (9.30) gives us 


n n n 1/p 7 _m 1/q 
S~ axbi| = J laxdel = (> jul”) (dolml’) . (14.58) 
k=1 k=1 k=1 k=1 

Now, if |a1|, |@2|,...,@n| is a nonzero sequence, then the real variable 


characterization on page 136 tells us that the second equality holds if 
and only if there exists a constant \ > 0 such that A\a;z|!/? = |b,|!/¢ for 
all<k<n. 

The novel issue here is to discover when the first equality holds. If 
we set apbp, = pre’? where py > 0 and 6% € (0,27) and if we further set 
Dr = Pr/(p1t+ p2+-++:+ pn), then the first equality holds exactly when 
the average p et 4 poet? feet pneren is on the boundary of the unit 
disk, and this is possible if and only if there exists a 6 such that 0 = 6; 
for all & such that py 4 0. In other words, the first equality holds if and 
only if the values arg{a,b,} are equal for all k for which arg{a,b,} is 
well defined. 


SOLUTION FOR EXERCISE 9.6. One checks by taking derivatives that 
ob" (x) = (1—p)a—2+1/P(1 4 21/?)-2+? /p, and this is negative since p > 1 
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and x > 0. One then applies Jensen’s inequality (for concave functions) 
to wy = |ax|? and x, = |bx|?/|ax|?; the rest is arithmetic. This modestly 
miraculous proof is just one move example of how much one can achieve 
with Jensen’s inequality, given the wisdom to chose the “right” function. 


SOLUTION FOR EXERCISE 9.7. Without lost of generality, one can as- 
sume that the integrals of the upper bound do not vanish. Call these 
integrals I; and Iz, apply Young’s inequality (9.6) to u = f(x) |/1t/” 
and |g(x)| yick 1 multiply by w(x), and integrate. Hélder’s inequality 
then follows by arithmetic. For a thorough job, one may want to retrace 
this argument to sort out the case of equality. 


SOLUTION FOR EXERCISE 9.8. The natural calculus exercise shows the 
Legendre transform of f(a) = z?/p is g(y) = y?/q where q = p/(p—1). 
Thus, the bound (9.33) simply puts Young’s inequality (9.6) into a larger 
context. Similarly, one finds the Legendre transform pair: 


f(z) =e" g(y)=ylogy—y and ¢(4)=alogr—azt>+ yy) =e. 


This example suggests the conjecture that for a convex function, the 
Legendre transform of its Legendre is the original function. This con- 
jecture is indeed true. Finally, for part (c), we take 0 < p < 1 and note 
that g(py1 + (1 — p)y2) = supzep{x(py1 + (1 — p)y2) — f(x)} also equals 
supe p (p{(zyi — f(xz)} + (1 — p){xzy2 — f(x)}). Since this is bounded 
by suppep PA(2yi — f()} + Supyep (1 — p){eye — f(a)} which equals 
pg(yi) + (1 — p)g(y2), we see that g is convex. 


SOLUTION FOR EXERCISE 9.9. Part (a) follows by applying Hélder’s 
inequality for the conjugate pair (p/r,q/r) to the splitting aj - bi. Part 
(b) can be obtained by two similar applications of Hélder’s inequality, 
but one saves arithmetic and gains insight by following Riesz’s pattern. 
By the AM-GM inequality one has xyz < «?/p+ y4/q+2"/r and, after 
applying this to the corresponding normalized values a;,b;, and ¢;, one 
can finish exactly as before. 


SOLUTION FOR EXERCISE 9.10. The historical Hélder inequality follows 
directly from the weighted Jensen inequality (9.31) with d(a) = z?, a 
proof which suggests why Holder might have viewed the inequality (9.31) 
as his main result. 

To pass from the bound (9.34) to the modern Holder inequality (9.1), 
one takes w;, = 6% and y, = a,/b7'. To pass from the bound (9.1) to the 
historical Holder inequality, one uses the splitting a,b, = fw! is yrtw,! ay, 
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SOLUTION FOR EXERCISE 9.11. The hint leads one to the bound 


D {eat + - oul < {of Sraph" +a-of roa h} 


and, if we let s — oo, then the formula (8.5) gives us 
n 1-0 

So alPot-! < {y Sra rl {> uth / (14.59) 

k=1 k=1 
which is Hélder’s inequality after one sets 6 = 1/p. This derivation of 
Kedlaya (1999) and Maligranda (2000) serves as a reminder that the 
formula (8.5) gives us another general tool for effecting the “additive 
to multiplicative” transformation which is often needed in the theory of 
inequalities. 
SOLUTION FOR EXERCISE 9.12. Fix m and use induction on n. For 
n = 1 we have w, = 1, and the inequality is trivial. For the induction 
step apply Holder’s inequality to u,v, + ugvg +--+: + UmUm where 

n-1 


ug = |] oft, oy = a, p= 1/(wy + wo +--+ + Wns), andg = 1/tn. 
k=1 


This gives us the bound 


mn m n-l Wit +Wn— m Wn 
SThost = {Tar (Sain) 


j=l k=1 gah. 
and the proof is then completed by applying the induction hypothesis 
to the bracketed sum of (n — 1)-fold products. 
To prove the inequality (9.36), we first apply the bound (9.35) with 
W1 = W2 = w3 = 1/3 to the array 


Go Be Oe 
A=|a fey y 
De <Yie 
to find a + (xy)? + (xyz)s < {(8a)(e@+y+ /ay)(a + y+ z)}3. Now, 
by applying ,/zy < (a + y)/2 one finds 


3o(0 ty + VED)(e+y +2) < ote. = TY SET? 


and this completes the proof of the bound (9.36) suggested by Lozansky 
and Rousseau (1996, p. 127). The bound (9.36) is due to Finbarr Holland 


Solutions to the Exercises 265 


who also conjectured the natural n-variable analogue, a result which was 
subsequently proved by Kiran Kedlaya. 


SOLUTION FOR EXERCISE 9.13. Taking the bound (9.38) and the in- 
version of the bound (9.39) one finds 


(sins en 2 ee oo2?3 26 paren < (5,/5,)8°, 
which we can write more leanly as 


(S./Sp) VE? < (8:/8.)/"? or str < St-898-", as claimed. 


SOLUTION FOR EXERCISE 9.14. As in Problem 9.6 we begin by noting 
that scaling and the Converse Holder inequality imply that it suffices to 
show that we have the bound 
CikERY; < Mo for all ||x||, <1 and |ly|]v <1, (14.60) 
j=l k=1 


where t’ = t/(t—1) is the conjugate power for t (so 1/t+1/t’ = 1). Also, 
just as before, the assumption that c;, > 0 for all j,k implies that it 
suffices for us to consider nonnegative values for x; and yz. To continue 
with the earlier pattern, we need to set up the splitting trick. Here there 
are many possibilities, and an unguided search can be frustrating but 
there are some observations that can help direct our search. 

First, we know that we must end up with the sum of the x; and the 
sum of the ys separated from the cj, factors; this is the only way we can 
use the hypotheses that ||z||, <1 and |ly|| <1. Also, the definition of 
the splitting will surely need to exploit the defining relations (9.42) for 
the three variables s, t, and 6. 

When we try to combine these hints, we may note that 


while for the conjugate powers t/ = t/(t — 1), t¢ = to/(to — 1), and 
t, = t,/(t, — 1) we have the analogous relation 
t t 


Pag ie At 6)(1 —1/t9) + 0(1 ié)}ady aL 


Now, we just need use these relations to create a splitting of cj,r,y; 
which will bring the sums of 77; and yf in to view after an applications 
of Holder’s inequality. With just a little experimentation, one should 
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then find the bound 


qt nt m 
yD owe 


j=l k=1 j=1k 


ay 


j=l 


(cjnarg! ys, a oe: "(Crk xy ty i pu 
1 


& 


. 1-0 
s c wnt” haf 4 ) 

m n ae 0 

(Sof Dewey by) ‘) 


j=l *k=1 


x 


This bound is a grand champion among splitting trick estimates and, 
after our eyes adjust to the clutter, we see that it is the natural culmi- 
nation of a line of argument which we have used several times before. 

To complete our estimate, we need to bound the last two factors. For 
the first factor we naturally want to apply Holder’s inequality for the 
conjugate pair to and tj = to/(to — 1). We then find that 


m ae hes to\ 1/to 7 m x 1/t 
os Sons _— 3 Sensi yy 


j=1 j=1 = j=1 


n 1/so 7 m 1/to 
< Mo( >t) (du) < Mo, 
k=1 j=l 
where in the second inequality we applied the bound ||Tx||z, < Mo||x||s, 
to the vector x = (aj, 23,..., 2%). We can then bound the second factor 
in exactly the same way to find 


5 {> 2° elm by <M, 


so when we return to our first bound we have the estimate 


m n 


» ere < M?M,~°. 


This is exactly what we needed to complete the solution. 


SOLUTION FOR EXERCISE 9.15. One can proceed barehanded, but it is 
also instructive to apply the result of the preceding exercise. From the 
hypothesis we have ||T'x||2 < ||x||2, and from the definition of M one 
finds ||Tx||,. < M||x||1. Since the linear system 


(a) Cras) 4-933) 
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has 6 = (2 — p)/p € [0,1] as its unique solution, the bound (9.43) is 
indeed a corollary of Exercise 9.14. 


CHAPTER 10: HILBERT’S INEQUALITY 


SOLUTION FOR EXERCISE 10.1. The proof fits in a single line: just note 
5: aja,xia® = = (Sem) 20 
j,k,=1 


and integrate over [0,1]. For the general case, one naturally uses the 
representation 1/A; = i vida. 

This problem is a reminder that there are many circumstances where 
dramatic progress is made possible by replacing a number (or a function) 
with an appropriate integral. Although this example and the one given 
by Exercise 7.5, page 116, are simple, the basic theme has countless 
variations. Some of these variation are quite deep. 


SOLUTION FOR EXERCISE 10.2. We substitute, switch order, apply the 
bound (10.18), switch again, and finish with Cauchy’s inequality to find 


So ajnhjnejye 
jik 
7 Lf, (Tanase )YK ga (x )) dx 
< |, “(> jnsfie)P) gos poe) de 


< (Dal, incertae) (Sout f jav(o))? dr) 


The bound @fM||x|lo|ly||2 now follows from the assumption (10.21). 


1/2 


SOLUTION FOR EXERCISE 10.3. To mimic our proof of Hilbert’s in- 
equality we take \ > 0 and use the analogous splitting to find 


Yat as mails) (a) 
2-66 aa) : ik 
ae 2 max? (m,n) ~—(2 ) alin (=) 


By Cauchy’s inequality, the square of the double sum is bounded by the 
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product of the sum given by 


aa Cin M\24 = =< 1 in 
Se a 


and the corresponding sum containing {b2}. If we take \ = 1/4, we have 


- 1 M\ 3 mm lymy3 Sl ymy3 
See a a ee 


so, to complete the proof we only need to note that the {62 } sum satisfies 
an exactly analogous bound. 

Finally, the usual “stress testing’ method shows that 4 cannot be 
replaced with a smaller value. After setting a, = b, = n-2-€, one 
checks that 


= 1 oe D 
Daeg ON and >» == +0(1). 


m=l1n=1 


Peeking ahead and taking K(x, y) = 1/max(z, y) in Exercise 10.5, one 
finds that the constant 4 in the bound (10.3) is perhaps best understood 
when interpreted as the integral 


0 u max(1, u) 


SOLUTION FOR EXERCISE 10.4. One can repeat the proof of the discrete 
case line-by-line, and to do so is worth one’s time. The parallel between 
the discrete and continuous problems is really quite striking. 


SOLUTION FOR EXERCISE 10.5. The first step exploits the homogeneity 
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condition of K (x,y) by a homogeneous change of variables y = ux: 


fay i f(x )g(y) dady 
self reotoa} 

= [Px {f° Kaas iene 
-[ 10) {f° K(1yu)a(ua) du} ae 
wf” Hosa fronted) 


Now, once K has been pulled outside, we can apply Schwarz’s inequality 
to the inside integral to find 


[ t@ glue) ae < (f° LF¢ seae) (fo a(ua)? a . 
=( [iorar) K(f ioeae) 


so we see at last that 


aie aces saw ( fo yieyPae) (J ler at) 


This completes the solution of the exercise with c given by the first of the 
three indicated integrals, and we can make a simple change of variables 
to check that all three of the integrals are equal. 


This argument is yet another of the gems from Schur’s remarkable 
1911 paper. Actually, Schur proves the trickier finite range result, 


[sox ratupdedy <e( f° ne year) ([owtar) 


where 0 < a < b < ov. In this case, the domain of integration changes 
with the change of variables, but the original plan still works. 


SOLUTION FOR EXERCISE 10.6. Integral comparison gives part (a) by 


t t t mit © dy 
ee ee, apes f fae 
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and for part (b) we note that 


is minimized by taking t = (B/A)?. Part (c) just assembles the pieces. 


SOLUTION FOR EXERCISE 10.7. Since |t—a| < a for t € [0,27] we have 
1 27 N N 
tkt tkt 
nisa{ [| rae || metfar} 
k=1 k=1 


1 Qn; N ale 1/2 1 Qr 
ape Oo de = 
Sah, (| et aah 

n 1/2 n 1/2 
=1{ oa} {dat} . 

k=1 


This remarkably quick way of obtaining Hilbert’s inequality is known 
as Toeplitz’s method. Hilbert’s original proof also used trigonometric 
integrals, but those used by Hilbert were not quite as efficient. Toeplitz’s 
argument tells us more generally that if y is any bounded function on 
[0,2] with Fourier coefficients c,, —oo < n < oo, then one has the 
bound 


N 


se bp eikt 


k=1 


2 1/2 
ar} 


S |l¢lloo llalle [lb lla. 


N N 
aaa, 


m=l1n=1 


Integral representation can also be used to prove more distinctive gen- 
eralizations of Hilbert’s inequality. For example, if a ¢ Z one finds 


27 
1 eil(nta)t ay = 1 


tT os 
mai ——_——_——. Ss 
ae F a(n +a). INQ 7, 


and this representation can be used to show 


TT 


Ilall2 [1Oll2- 


~ |sinan| 
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SOLUTION FOR EXERCISE 10.8. We substitute and change orders: 
Se =| {/ e tty) ar} —<d 
I Lhyge 0 0 y? 
co co 1 
=| e? {/ e Y ay} dt 
) 0 y 


2 i ev {tax} dt = T(2\)T(1 — 20). 


CHAPTER 11 HARDyY’S INEQUALITY AND THE FLOP 


SOLUTION FOR EXERCISE 11.1. By applying Holder’s inequality with 
p = B/a and q = B/(6 — a) to the right side of the bound (11.20) we 
obtain the inequality 


T T a/B T 
ee Bea as B/(B—2)(») dy 
[ Pooaese ["oerar) ([" ar) 


There is no loss of generality if we assume that the first integral factor 
on the right is nonzero, so we may divide both sides by that factor. If 
we then raise both sides of the resulting bound to the power 3/(3 — a) 
to get our target bound (11.20). 

It is only in the division step where we use the condition that y is 
bounded. The inequality for bounded functions can then be used to 
prove a corresponding inequality for functions that need not be bounded. 
It is quite common in arguments that call on the flop for one to first 
consider bounded functions so that one can sidestep any inappropriate 
arithmetic with integrals that might be infinite. 


(8-a)/8 


SOLUTION FOR EXERCISE 11.2. By the AM-GM inequality we have 
223 < yt y?a t+ yz? < y? + 2y3/3 + 23/3 + y3/3 + 2273/3 = 2y3 + 23. 
In this example, the higher power on the left made the transformation 
possible, but for the transformation to be nontrivial one also needed 
cooperation from the constant factors. If we replace 2 by 1/2 in the 
original problem, we only obtain the trivial bound —2? < 4y?. 


SOLUTION FOR EXERCISE 11.3. The hypothesis (11.21) and Schwarz’s 
inequality give us 


[ dO < [ #o) wot |” u'(0) ww\ {f*@ ww\ 


which is 2? < c? + 6cx with the natural identifications. If we solve this 
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for the case of equality, we find « = c(6 + V40)/2, so the hypothesis 
(11.21) implies the bound (11.22) if we take A = {(6 + v/40) /2}?. 


SOLUTION FOR EXERCISE 11.4. Only a few obvious changes are needed 
to convert the proof of the L? bound (11.1) to a proof for the corre- 
sponding L? bound (11.23). By that analogy, we first note 


= [EP sera da = —— en f(u dau} (2*?)’ da, 


so integration parts gives us i ent 
1l-p x Pp 
“2 { | Flu) au . 
0 5 1 0 


me! fea {= [Hu Jaw dx — 


As before, the boundary contribution at zero is zero, and the contribu- 
tion at T is nonpositive; therefore, we have the bound 


D T 1 2 p-1 
sori c re) {= ff fu) du dx, 


which is the L? analog of the preflop L? bound (11.4). One now finishes 
with the L? flop precisely as in Exercise 11.1 provided that one sets 
a=p-—land =p. 


SOLUTION FOR EXERCISE 11.5. Without loss of generality we assume 
that a, > 0 for alln = 1,2,.... We then set A, = ay +a2+---+a, and 
apply Cauchy’s inequality followed by Hardy’s inequality (11.9) to get 


T= yet < 25 a2. 
n=1 n=1 


We then finish with the even simpler bound 


ye 
m+n 2 


1<m<n<oo 


Co 


n 
2 a ee — _ 
n=1 m=1 


SOLUTION FOR EXERCISE 11.6. The solution of this problem is not easy, 
and here we follow the one provided by Richberg (1993) that begins with 
the observation that 


ff ae D fl forraa= 0 (GF), 


so our target inequality is equivalent to 


— (st)% dsdt l-—«z 
4 log 2) ——. 
ff? gan age Oe Tae 
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This bound would follow from 


Led 
dsdt l-«@ 
4 log 2) —— 
i ion oe ree, 


and by a direct calculation one finds 


dsdt 1 dt 
= ] 1+t)— 
[fp = 2 f logit +9. 


so the proof of the our target inequality is reduced to showing 


1 
dt 1l-z 


This bound and the fact that it is sharp was already addressed in Exer- 
cise 7.10, so the solution is complete. 


SOLUTION FOR EXERCISE 11.7. This observation is painfully obvious, 
but it seems necessary for completeness. The hypothesis gives us the 
bounds 6; < aj, b2 < a2,..., by < ay; thus, for all 1 <n < N we have 
(bbz +++ bn)'/" < (ayag+++an)'/", which is more than we need. There 
are questions on infinite rearrangements which are subtle, but this is not 
one of them. 


SOLUTION FOR EXERCISE 11.8. 

From the convergence of the sum, we know that the sequence of re- 
mainders rp, = dn41/(1 +1) + Gn42/(n + 2) + Gni3/(n + 3) +--+ must 
converge to zero as n — co. When we write these terms in longhand, 


TO = @ +4@2/2 +a3/3  --- +--+ + +++ +an/n +rp 
Ty = ay /2 +a3/3 Rents Serene i at +a,/n +Tn 
T2 = a3/3 i ey +a,/n TTn 
Tr-2 = Qn-1/(n—1) +an/n +1 n 
T,-1 = an, /n +Tn, 


we see they may be summed to yield the nice identity 


(ay tag +--+ +4n)/n = —ty t+ (rot ri te +9 n-1)/n, (14.61) 


which makes the limit (11.25) routine. 


CHAPTER 12: SYMMETRIC SUMS 


SOLUTION FOR EXERCISE 12.1. If the roots of P(a) are 21, 22,...,2n, 
then @,—1/@y) = (1/ey +1/ag+---+1/an) and ay = a4, 4+%g++:-+2y 
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so we have (@n—1/an)~! < a1/n by the HM-AM inequality (8.14). This 
exercise offers a basic reminder: facts for polynomial coefficients and 
facts for symmetric sums are almost in a one-to-one correspondence. 


SOLUTION FOR EXERCISE 12.2. (a) By expansion and simplification, 
we see that we need to prove 


Gabe < ac? + ab? + ba? + bc? + ca? + cb? = R, 
and after setting a = 71, b= x, and c= x3 we also have 
S- Lo(1)To(2)Xo(3) = Gabe and > Lo(1)25(2) =R. 
o€ES(3) oES(3) 


Since (1,1,1) = $(2,1,0) + #(2,0,1) +-+-+ (0,1, 2) we have (1,1,1) is 
in H[(2,1,0)], so we may apply Muirhead’s inequality. 

(b) We have (1,1,0,...,0) = $(2,0,0,...,0) + $(0,2,0,...,0) so we 
have (1,1,0,...,0) € H[(2,0,0,...,0)], and by Muirhead’s inequality it 
suffices to note that 


S- g(1)4o(2) = 2(n—2)! S- aj;4, and S- a1) = (n-1)! Sa. 
aES(n) 1<j<k<n oES(n) gel 

(c) Since the average {(1/2,1/2,0,...,0) +--+ +(0,...,0,1/2,1/2)} /(3) 
equals (1/n,1/n,...,1/n), it suffices by Muirhead’s inequality to note 


1/n /n n 
S- ety asm) =nl(aja---an)'/" and 


aES(n 
1/2 
y Hoty ta i = 2(n—2 y Sajak. 
aES(n 1<j<k<n 


SOLUTION FOR EXERCISE 12.3. Multiply the left side of the bound 
(12.23) by (ayz)'/8 and consider the candidate inequality 


gt /3yt/3 21/8 4 p/3y7/3 21/3 4 gi/3yl/3 27/8 <a +y3423, (14.62) 


This generalizes our original problem in the sense that if we can prove 
that the bound (14.62) holds for all nonnegative x, y, z then the bound 
(12.23) must hold when xyz = 1. Fortunately, the new bound (14.62) is 
a corollary of Muirhead’s inequality and the relationship 


1 


Kedlaya (1999) presents several more sophisticated examples of the ho- 
mogenization device. 


(7/3, 1/3, 1/3) = 3, 0,0) + 4(0,3,0) + 
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SOLUTION FOR EXERCISE 12.4. By expanding the bound (12.24) we 
see after simplification that it is equivalent to the assertion that 


ders Dy ap lag 
Gk)G#R G,k)G#R 
but (m,m,0,...,0) = $(m+1,m-1,0,...,0)+$(m—1,m+1,0,...,0) 
so the bound (12.24) follows from Muirhead’s inequality. 


SOLUTION FOR EXERCISE 12.5. With surprising frequency, solvers of 
this exercise find the same example discovered by Bunyakovsky (1854): 


p(x, y) = { x? +(1- y)* b{y? +(1- wet, 


Here one has p(1,0) = 0 and p(0,1) = 0 but otherwise p(z, y) is strictly 
positive. Thus, despite the symmetry of p, the minimum of p is not on 
the diagonal D = {(z,y) : x = y}. Incidentally, this problem reminds 
us that whenever we are in pursuit of some conjecture, it is important 
to allocate time to the search for counterexamples. One often discovers 
quite quickly that the conjecture must be refined — or even rejected. 


SOLUTION FOR EXERCISE 12.6. First, by (cyclical) symmetry, we can 
assume that x > y and x > z. This makes x “special,” so it is then 
natural to consider the symmetry properties of y and z. If we consider 
the difference 


f(x,y, 2) — f(x,2,y) — (y—- z)(x— y)(x =), 


we see it is negative when y is less than z, so we can assume without 
loss of generality that y > z. Finally, assuming « > y > z we note 
f(x+z,y,0) — f(x,y, z) = z7y + yz(a — y) + zy(y — z) > 0, so we may 
also assume without loss of generality that z = 0. We can now finish 
with calculus as suggested by the hint or, alternatively, we can use the 
AM-GM inequality check that for «+ y = 1 we have 


g*y 1 f(x+a+2y 
Ce) aera ( 


3 
= 4/27. 
2 ~ 2 3 ) / 
One lesson to take away from this exercise is that it is often possible 
to make step-by-step progress by considering how a function changes 
when subjected to simple transformations such as the interchange of 
two variables. 
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SOLUTION FOR EXERCISE 12.7. Pitman solves his Problem 3.1.24 by 
first expanding 1 = (x + y+ 2)? and then noting that it suffices to show 


Qa=xyterztyaety et ot ety < 1/4 when e+y+z=1. 


If we write Q = r{a(yt+ z)} 4+ y{y(a@ t+ z)} 4 z{z(a@+ y)}, then it now 
suffices to notice that each of the three braced expressions is bounded 
below by 1/4 by the AM-GM inequality. Other solutions can be based 
on the homogenization trick of Exercise 12.3, or Schur’s inequality (page 
83), or the reduction devices of Exercise 12.6. 


SOLUTION FOR EXERCISE 12.8. This elementary (but very useful!) in- 
equality serves as a reminder that symmetry is often the key to successful 
telescoping. Here the telescoping identity 


@142°** An — bybg--- n= Da ‘Aj 1( — bj)bj41°++ On 


makes the Weierstrass inequality immediate. Naturally, generalizations 
of this identity lead one to more elaborate versions of Weierstrass in- 
equality. 


CHAPTER 13: MAJORIZATION AND SCHUR CONVEXITY 


SOLUTION FOR EXERCISE 13.1. From each of the representations 


a 1/2 1/3 1/6\ [zx a 0 1/2 1/2\ /x 
b)=[1/3 2/3 0 | ly b)=[1/2 1/6 1/3) 1 y 
c 1/6 0 5/6 z c 1/2 1/3 1/6 z 


one gets (a,b,c) ~ (x,y, z). The inequalities of the exercise then follow 
from the Schur concavity of the map (x,y,z) > xyz and the Schur 
convexity of the map (z,y,z) = 1/x° +1/y®?+1/2°. 


SOLUTION FOR EXERCISE 13.2. If we set s = (a, + %2 +-+-+ an)/k 
we have (11, 22,...,n) ~ (8,8,...,8,0,0,...,0) when we take & copies 
of s. Thus, for convex ¢: [0,co) > R, Schur’s majorization inequality 


(13.18) gives us $(01) + 6(e2) +++ + tn) < (n— h)9(0) + ko(s), and 
we can set ¢(x) = 1/(1+ 2) to obtain the bound (13.21). 


SOLUTION FOR EXERCISE 13.3. If one sets 


— Jut+d/m forl<k<m 
— w—d/(n—m) form<k<n 


where p = (@1+%2+:+:+2,)/n, then from the condition (13.22) it follows 
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easily that y < x. The map f(x) = z7+23+---+22 is Schur convex, so 
we have f(y) ~ f(x), and, after expansion, this is precisely the target 
inequality (13.23). For the connection to Szemerédi’s Regularity Lemma, 
see Komldés and Simonovits (1996). 


SOLUTION FOR EXERCISE 13.4. Two applications of cancellation iden- 
tity (13.24) permit one to reduce Schur’s differential (13.4) to 


—(a5 = tt) en—1(21, 22, eee) Ug—1,Us41,---,>Vt-1,Tt+1,-- .Ln), 
and this polynomial is obviously nonpositive for x € [0,0o)”. 
SOLUTION FOR EXERCISE 13.5. Use Schur’s criterion (13.4) and note 


(2j — tk) (Sa, (%) — 82, ()) = 2(aj — te)?"/(n—1) 20 and 


(pj — Pk) (hp, (P) — hp, (P)) = (Pj — Pe) (log pj — log px) = 0, 


where the subscripts connote partial derivatives. Incidentally, the second 
formula verifies that h(p) is Schur convex on all of (0,00)”, not just the 
subset of (0,00)” where p has sum equal to one. 


SOLUTION FOR EXERCISE 13.6. Since (1/n,1/n,...,1/n) ~ p, this is a 
special case of the bound (13.18) for ¢(a) = («+ 1/x)* since 

6!" (x) = ofa +1/x)*(e +0°)-2{(1 + a? — 2) + a(1 — 2”)?} 
must be positive for 0 < « < 1 anda> 0. The relevance of Schur 


convexity to this problem was noted by Marshall and Olkin (1979, p. 72); 
a proof using Lagrange multipliers is given by Mitrinovié (1970, p. 282). 


SOLUTION FOR EXERCISE 13.7. In the uniform case the probability is 
1— (1 — 1/365) - (1 — 2/365) - -- (1 — 22/365) ~ 0.5079 .... In the general 
case the probability is 1 — e,(p1,pe,..-,p365) where e,(p) is the nth 
symmetric polynomial and px is the probability that a randomly chosen 
person is born on day k. By Exercise 13.4 the polynomial e,,(p) is Schur 
concave, and this is even more than one needs. The connection between 
majorization and the birthday problem has been made in Clevenson and 
Watkins (1991) and Proschan and Joag—Dev (1992); McConnell (2001) 
gives a treatment for nonuniform probabilities without explicit recourse 
to majorization. 


SOLUTION FOR EXERCISE 13.8. The necessity of the condition is imme- 
diate, so we just need to prove sufficiency. In Weyl’s terms, girl 7 knows 
precisely the boys in the set Sj, so for a given a set A of girls, every boy 
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in the set Uje 45; will be known by some girl in A. We now consider 
two cases. 

In Case I, we assume that the inequality (13.26) is strict for all A with 
|A| <n. Girl n then marries any boy b she knows. Since the condition 
(13.26) continues to hold for all A C {1,2,...,2— 1} when each S,, 
1<j <n-—1, is replaced by S; \ {b}, the remaining girls can be married 
by induction to the remaining boys. 

In Case II, we assume that equality holds in the bound (13.26) for 
some Ag with |Ao| <n. We then let 


B= |) Ss; and set S;=S,;\B for all 7 € AG. 
jEAo 


The girls in Ap can be married to the boys in B by induction, and it 
remains to show that the girls in AG can be married to the boys in B®. 
We now take any A C AG and note that 


U 5; 


jEAQgUA 


> |Ao U A] = |A] + Aol. 


We also have the identity 


Ul Us}U{Us}| =e 


jE AQUA jEAo JEA 


Us; 


jEA 


Thus, we find for all A C Aj that we have 


Us; 


jeA 


2 |Al; 


that is, every set of k girls in A} knows at least k boys in B®. By 
induction the girls in A§ can be married to the boys in B*. This proof is 
essentially the one given by Halmos and Vaughan (1950). The marriage 
lemma is a cornerstone of the large and active field of matching theory 
which is beautifully surveyed by Lovdsz and Plummer (1986). 


SOLUTION FOR EXERCISE 13.9. One can argue by induction on the 
number of nonzero entries of D, but it is perhaps more concrete to look 
for an algorithm to compute the required convex combination. Either 
way, the basic idea is to use the marriage lemma to make step-by-step 
progress. 

For each 1 < j < n, we let S; denote the set of all k such that dj, > 0, 
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and we note that for each Ac {1,2,...,n} one has 


l= odes DO de = 


JEAkKES; ke€UjeaSj 1Sj<n 


U si 


jeA 


By the marriage lemma, there is a system of SDRs of {S1, S2,...,Sn}, so 
we can define a permutation o by taking o(j) to be the representative 
from S; for each j = 1,2,...,n. Now, we let P, be the permutation 
matrix associated with o and set a = mindj,(;) > 0. If a = 1 then 
D is a permutation matrix, and there is nothing left to prove. On the 
other hand, if a < 1 consider the new matrix D’ defined by setting 
D! = (1—a)~1(D — aP,). We then have D = aP, + (1 — a)D’ and 
D’ is a doubly stochastic matrix with more zero entries than D. The 
proof may now be completed by applying the induction hypothesis to D’. 
Alternatively, one can compute the required summands by repeating the 
analogous steps until the representation is complete; at most n? steps 
will be needed. 


CHAPTER 14: CANCELLATION AND AGGREGATION 


SOLUTION FOR EXERCISE 14.1. To prove the second bound (14.29), we 
again sum b, 2; + b2z2 +---+ bnzZn by parts to get 


51 (by — b2) + S2(b2 — 63) +--+ + Sn—1(bn—1 — bn) + Snbn, 
but this time we bound the sum |b, 21 + b222 +--+ + bn2n| by noting 
|S1||b1 — b2| + |Sa||b2 — b3| + +++ + |Sp—a]|On—1 — bn| + |Snlbn 
< max, |Sz|{(b2 — b1) + (b3 — b2) +--+ + (Bn — bn—1) + bn} 

= {(bn — b1) + bn} ee [Sk] < 2bn a | Sil. 


SOLUTION FOR EXERCISE 14.2. From the nonnegativity of g one has 
the bounds 


b b b 
i < < 
sin, Fv) [ ale) de < fF fla)a(a)de < max, f(a) fale) ae 
and by the continuity of f it takes on all values between its minimum 
and its maximum. These observations give us the first IMVF (14.30). 
To prove the second, choose ® with ®(a) = 0 such that ®’(a) = ¢(z), 
then integrate by parts and apply the first IMVF with f(a) = ®(x) and 
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x) = —W'(x) > 0 to find 


"Ps ae nei ileal 


= (b/d v© [wed 
“(teen 800) 
ua) ay?) f 


Since 0 < ~(b) < ¢(a) the bracketed quantity is an average of ®(b) and 
®(£€) so it must be equal to ®(£)) for some &) € [€,b] C [a,b] by the 
continuity of ®. 


v0) 


SOLUTION FOR EXERCISE 14.32. The bound (14.32) is immediate from 
the second IMVF (14.31). The sine bound (14.33) then follows by taking 
f(x) = 1/2, g(x) = sinz and by noting that the integral of g over [a, }] 
is cos b — cosa which is bounded by 2 in absolute value. 


SOLUTION FOR EXERCISE 14.4. We are given that 6/(-) is monotonic, 
and, without loss of generality, we assume it is nondecreasing. From the 
second IMVP of Exercise 14.2, we find that 


aie om 


6'( 
_ sin 6(€) — sin 6(a) 
— Wa) 1 [ teosata \40'(x) dx (a) ; 


The last ratio has modulus bounded by 2/v, so to complete the proof, 

one only needs to check that an exactly analogous argument applies to 

the imaginary part of the integral in our target inequality (14.34). 
Since 6’() is strictly monotone, it vanishes at most once in the interval 


(a, b], and, for the moment, suppose it vanishes at c. To prove the second 
bound (14.35), we write the integral J over [a, b] as the sum J + I2+J3 of 
integrals over [a, c— 4], [c—6,c+ 6] and [c+6, b]. In the interval [c+ 6, b], 
one has 6’(a) > pd, so by the bound (14.34), we have |J3| < 4/pd. An 
analogous bound applies to [,, while for the integral Iz we have the 
trivial bound |I2| < 26. In sum we have 


|Z| < |i] + [L2| + [Js] < + 26, 


which we can minimize by setting 6 = 2/,/p to obtain the target bound 
(14.35). To be 100% complete, one finally needs to note that the target 
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bound continues to hold if c+ 2/,/p ¢ [a,b], or, indeed, if 6’(a) does not 
vanish in [a, }]. 


SOLUTION FOR EXERCISE 14.5. To begin let W denote the target sum, 


and note that 
Qrijk 
so(%) 
Pp 


keB 


WS 


~ Dew (*) [sd 


jEAkEB jEA 


so Cauchy’s inequality gives us 


WP <|Al>o 


jEA 


(?x*) ; 
S-exp( =) ] . 
Pp 


keB 


Now we come to a devilish trick: we extend the outside sum to all 
of F, = {0,1,...,p — 1}. This is feasible because we are just adding 
positive terms, and it is sensible because it sets up the application of 
the cancellation identity (14.36). To put the algebra neatly, we first 
define the function d(x) by setting 6(0) = 1 and d(a) = 0 for « £ 0, then 


we note 
Qrijk 7 
exw (=) 
Pp 


keB 


|W? 


IA 
— 


Dy 


jEFp 


= (=m) 


jEFp k1,k2E€B 


Qrij(ky — ke) 
i 
k1,k2€B jEFp 
= |Alp > 6(ky — ko) = pl|Al|BI. 


ki,ko€B 


This problem and the description “extend and conquer” are from the 
informative exposition of Shparlinski (2002) where one finds several fur- 
ther examples of the ways to exploit complete sums. Shparlinski links 
bounds of the type (14.37) back to the work of I.M. Vinogradov; in 
particular, Exercise 14 of Vinogradov (1954, p. 128) is of this kind. 


SOLUTION FOR EXERCISE 14.6. For each 1 < k < flog,(x)| = K 
we have the bound g(x/2*-!) — g(/2*) < Ax/2* + B. Summing these 
gives us g(x) — g(x/2*) < Ax(1+1/2+1/2? +---+1/2*)+ KB, or 
g(a) < 2Ax + Bilogs(x)] + maxo<i<i g(t), so we can take A’ = 2A, 
B’ = Band C’ = B+ maxoci<i g(t). 
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SOLUTION FOR EXERCISE 14.7. For any AC {1,2,...,n} we have 
1 
i 6s Si) Jae) wey VY etanee ye. a6) 
o jEA JEAKEA jEA 


where the last inequality comes from applying the hypothesis (14.39) 
where y; = c; if 7 €¢ A and y; =O if 7 ¢ A. Next, if we replace a; by 
civ; (a) in the real-variable inequality (14.26) and integrate, we find 


1 k 
if ages (Deve) dx < [ [logs(n nef (SH 0) ae 


< flogs(n Soe Dee Cj 
BeB icB 


< [loga(n)] [1+ logo(n)]e >| c?, 


i=1 
which is slightly stronger than the target inequality (14.40). 
SOLUTION FOR EXERCISE 14.8. From the splitting 


p iF IT RI/2 yy pF /2y, 


YjUk = P 


we see that Cauchy’s inequality gives us 


(Sohne), 


j=1k=1 
n n n n 
—|j—k|,,2 —|j—k|,,2 

Se eel 

j=1k=1 j=l k=1 

n n n n 
< 2{ max —|j-kl |, 21 max lj—h| 
I= = — 


Next, geometric summation shows that we have 
+p 
re, a Se eeepc 
es »? a) ey, 
jEZ 
so our Cauchy estimate may be reduced to the simple bound 
n n 
1+p 
SoMa] PES en 
j=l k=1 Pay 


Given the inequality (14.64), the conclusion of Exercise 14.8 with the 
value M = (1+ p)/(1— p) follows from Exercise 14.7. 
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SOLUTION FOR EXERCISE 14.9. From the definition of Sg one finds 


f(9) = S- Zk DF ope NS S- Re (zpe*”) 


ZprESo ZprESo ZpES@ 
S- |zx| cos(@ — arg zp) | = S34 |z~| cos(O — arg zp). 
ZhESo ZpES@ 


It suffices to show that max f(@) is as large as the left side of the bound 
(14.41). To do this we compute the average, 


1 2a 2a 
5. | 10) 49> mh d_ [241 c0s(0 — arg zx) a0 
0 ZkESo 
= edsAl cos(6 — arg z,) dO = — |z%:| 
=A i arg(z.)—1/2 4 k=1 


so, indeed, there must exist some value 6* for which f(6*) is at least as 
large as the last sum. By taking {z, = exp(ik27/N):0<k < N} for 
large N one can show that the constant 1/7 cannot be improved. This 
argument follows W.W. Bledsoe (1970); Mitrinovié (1970, p. 331) notes 
that similar results were obtained earlier by D.Z. Djokovié. 


SOLUTION FOR EXERCISE 14.10. If LZ and R denote the left and right 
sides of the target bound (14.42), then by squaring and changing order, 
one finds the representation 


L = S- S- 2 an Ynr Uns Am Ymr Yms 


r=1 s=1 n=1m=1 


N WN R R 
- » 2D afin Ss > Ymr Yms Ynr Uns} 
r=1s=1 
R 


N N 
= S- > Am An 2, Ymr Ynr S Ums Yms 
N N R 
oan S- S- Am An See 
m= 
and the identical calculation from the right side R shows 


R= SY Am Aa 


n=1m=1 


2 


+ 


so our hypothesis gives us L < R. The bound (14.42) provides a generic 
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example of a class of inequalities called majorant principles, and the 
treatment given here follows Theorem 4 of Montgomery (1994, p. 132). 


SOLUTION FOR EXERCISE 14.11. The most direct proof just requires a 
big piece of paper and a timely application of Cauchy’s inequality. First 
expand the squares |(um,Vn)|? 
1<j<dand vz, 1 <k <d. Next, change the order of summation 
so that the double sum over 7 and & is outermost, and only now apply 


in terms of the vector components tm,;, 


Cauchy’s inequality. Finally, within each of the two resulting rooted 
expressions, you change the order of summation within each of the braces 
and reinterpret the sums innermost sums as inner products. 

This solution amplifies the remark of Montgomery (1994, p. 144) that 
manipulations like those used in the solution of Exercise 14.10 can be 
used to prove Enflo’s inequality. An alternative solution may be based 
on the observation that the functions ¢n.m(x,y) = e(mx)e(ny) are or- 
thonormal on the square [0, 1}?. One then introduces the function 


act M oN 
Flay) = SOS) (ain, vn)e(ma)e(ny) 


m=1n=1 
and exploits the fact that the integral of | f(a, y)|? over [0,1]? gives the 
left side of Enflo’s inequality. 


SOLUTION FOR EXERCISE 14.12. One always has (z,z) > 0 so if we set 
Z=x— (c1y1 + cay2 +-+++CnYn), we find for all c;, 1 <j <n that 


The so-called humble bound |c;é,| < $|cj|? + 5|ce|? gives us 


n n 
0 <(x,x) — S/o; (x,y;) Le (x, y;) 


j=l 
1 n n 
+5 Dal y5,¥e)1 +5 153 lee? [yiVel 
j=l k=1 = Lk=1 


and if we set c; = (x, Vi oa \(yj;,¥x)| simple algebra bring us to the 
inequality (14.43). This argument is based on the classic exposition of 
E. Bombieri (1974). 
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CHAPTER 1: STARTING WITH CAUCHY 


Bunyakovsky’s 1859 Mémoire was eighteen pages long, and it sold as 
a self-standing piece for 25 kopecks, a sum which was then represented 
by a silver coin roughly the size of a modern US quarter. Yale University 
library has one of the few extant copies of the Mémoire. On the title page 
the author used the French transliteration of his name, Bouniakowsky; 
here this spelling is used in the references, but elsewhere in the text the 
more common spelling Bunyakovsky is used. 

The volume containing Schwarz’s 1885 article was issued in honor of 
the 60th birthday of Karl Weierstrass. In due course, Schwarz came to 
occupy the chair of mathematics in Berlin which had been long held by 
Weierstrass. 

Dubeau (1990) is one of the few articles to advocate the inductive 
approach to Cauchy’s inequality that is favored in this chapter. 

The Cramér—Rao inequality of Exercise 1.15 illustrates one way that 
the Cauchy—Schwarz inequality can be used to prove lower bounds. 
Chapter 6 of Matousek (1999) gives an insightful development of several 
deeper examples from the theory of geometric discrepancy. The recent 
monograph of Dragomir (2003) provides an extensive survey of discrete 
inequalities which refine and extend Cauchy’s inequality. 


CHAPTER 2: THE AM-GM INEQUALITY 


The AM-GM inequality is arguably the world’s oldest nontrivial in- 
equality. As Exercise 2.6 observes, for two variables it was known even 
to the ancients. By the dawn of the era of calculus it was known for 
n variables, and there were even subtle refinements such as Maclaurin’s 
inequality of 1729. Bullen, Mitrinovic, and Vasié (1987, pp. 56-89) give 
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fifty-two proofs of the AM-GM inequality in (essentially) their chrono- 
logical order. 

Duncan and McGregor (2004) survey several proofs of Carleman’s in- 
equality including Carleman’s original, and Peéarié and Stolarsky (2001) 
provide a comprehensive historical review. 

Pélya’s 1926 article proves in one page what his 1949 article proves in 
eight, but Pélya’s 1949 explanation of how he found his proof is one of the 
great classics of mathematical exposition. It is hard to imagine a better 
way to demonstrate how the possibilities for exploiting an inequality are 
enhanced by understanding the cases where equality holds. The quote 
from Pélya on page 23 is from Alexanderson (2000, p. 75). 


CHAPTER 3: LAGRANGE’S IDENTITY AND MINKOWSKI’S CONJECTURE 


Stillwell (1998, p. 116) gives the critical quote from Arithmetica, Book 
III, Problem 19, which suggests that Diophantus knew the case n = 2 of 
Lagrange’s identity. Stillwell also gives related facts and references that 
are relevant here — including connections to Fibonacci, Brahmagupta, 
and Abu Ja’far al-Khazin. Exercise 3.2 is motivated by a similar exercise 
of Stillwell (1998, p. 218). Bashmakova (1997) provides an enjoyable 
introduction to Diophantus and his namesake equations. 

Lagrange (1771, pp. 662-663) contains Lagrange’s identity for the case 
n = 3, but it is only barely visible behind the camouflage of a repetitive 
system of analogous identities. For the contemporary reader, the most 
striking feature of Lagrange’s article may be the wild proliferation of 
expressions such as ab — cd which nowadays one would contain within 
determinants or wedge products. 

The treatment of Motzkin’s trick in Rudin (2000) helped frame the 
discussion given here, and the theory of representation by a sum of 
squares now has an extensive literature which is surveyed by Rajwade 
(1993) and by Prestel and Delzell (2001). Problem 3.5 was on the 1957 
Putnam Exam which is reprised in Bush (1957). 


CHAPTER 4: ON GEOMETRY AND SUMS OF SQUARES 


The von Neumann quote (page 51) is from G. Zukav (1979, p. 226 
footnote). A long oral tradition precedes the example of Figure 4.1, but 
this may be the first time it has found its way into print. The bound (4.8) 
is developed for complex inner products in Buzano (1971/1973) which 
cites an earlier result for real inner product spaces by R.U. Richards. 
Magiropoulos and Karayannakis (2002) give another proof which de- 
pends more explicitly on the Gram—Schmidt process, but the argument 
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given here is closest to that of Fuji and Kubo (1993) where one also finds 
an interesting application of the linear product bound to the exclusion 
region for polynomial zeros. 

The proof of the light cone inequality (page 63) is based on the discus- 
sion of Aczél (1961, p. 243). A generalization of the light cone inequality 
is given in van Lint and Wilson (1992, pp. 96-98), where it is used to give 
a stunning proof of the van der Waerden permanent conjecture. Hilbert’s 
pause (page 55) is an oft-repeated folktale. It must have multiple print 
sources, but none has been found. 


CHAPTER 5: CONSEQUENCES OF ORDER 


The bound (5.5) is known as the Diaz—Metcalf inequality, and the 
discussion here is based on Diaz—Metcalf (1963) and the comments by 
Mitrinovi¢é (1970, p. 61). The original method used by Pélya and Szeg6 
is more complicated, but, as the paper of Henrici (1961) suggests, it may 
be applied somewhat more broadly. 

The Thread by Philip Davis escorts one through a scholar’s inquiry 
into the origins and transliterations of the name “Pafnuty Chebyshev.” 

The order-to-quadratic conversion (page 77) also yields the traditional 
proof of the Neyman—Pearson Lemma, a result which many consider to 
be one of the cornerstones of statistical decision theory. 


CHAPTER 6: CONVEXITY — THE THIRD PILLAR 


Holder clearly viewed his version of Jensen’s inequality as the main 
contribution of his 1888 paper. Holder also cites Rogers’s 1887 paper 
quite generously, but, even then, Holder seems to view Rogers’s main 
contribution to be the weighted version of the AM-GM inequality. Ev- 
eryone who works in relative obscurity may take heart from the fact 
that neither Holder nor Rogers seems to have had any inkling that their 
inequality would someday become a mathematical mainstay. Peéari¢, 
Proschan, and Tong (1992, p. 44) provide further details on the early 
history of convexity. 

This chapter on inequalities for convex functions provides little infor- 
mation on inequalities for convex sets, and the omission of the Prékopa- 
Leindler and the Brunn-Minkowski inequalities is particularly regret- 
table. In a longer and slightly more advanced book, each of these would 
deserve its own chapter. Fortunately, Ball (1997) provides a well moti- 
vated introductory treatment of these inequalities, and there are defini- 
tive treatments in the volumes of Burago and Zalgaller (1988) and Schei- 
dner (1993). 
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CHAPTER 7: INTEGRAL INTERMEZZO 


Hardy, Littlewood, and Pélya (1952, p. 228) note that the case a = 0, 
3 = 2 of inequality (7.4) is due to C.F. Gauss (1777-1855), though 
presumably Gauss used an argument that did not call on the inequality 
of Schwarz (1885) or Bunyakovsky (1859). Problem 7.1 is based on 
Exercise 18 of Bennett and Sharpley (1988, p. 91). Problem 7.3 (page 
110) and Exercise 7.3 (page 116) slice up and expand Exercise 7.132 of 
George (1984, p. 297). The bound of Exercise 7.3 is sometimes called 
Heisenberg’s Uncertainty Principle, but one might note that there are 
several other inequalities (and identities!) with that very same name. 
The discrete analog of Problem 7.4 was used by Weyl (1909, p. 239) to 
illustrate a more general lemma. 


CHAPTER 8: THE LADDER OF POWER MEANS 

Narkiewicz (2000, p. xi) notes that Landau (1909) did indeed intro- 
duce the notation o(-), but Narkiewicz also makes the point that Landau 
only popularized the related notation O(-) which had been introduced 
earlier by P. Bachmann. Bullen, Mitrinovi¢, and Vasié (1987) provide 
extensive coverage of the theory of power means, including extensive 
references to original sources. 


CHAPTER 9: HOLDER’S INEQUALITY 


Maligranda and Persson (1992, p. 193) prove for complex aj, a2, ..., Qn 
and p > 2 that one has the inequality 


Pp 
yo 


j=l 


+ So jaz —agl? <n? S° Ja;l?. (14.65) 


1<j<k<n j=l 


This refines the 1-trick bound d(a) > 0 which is given on page 144, and 
it leads automatically to stability results for Holder’s inequality which 
complement Problem 9.5 (page 145). 

Problem 9.6 and the follow-up Exercises 9.14 and 9.15 open the door 
to the theory of interpolation of linear operators, which is one of the most 
extensive and most important branches of the theory of inequalities. In 
these problems we considered the interpolation bounds for any reciprocal 
pairs (1/s,,1/t,) and (1/809, 1/to) anywhere in S = [0,1] x [0,1], but we 
also made the strong assumption that cj; > 0 for all j,k. 

In 1927, Marcel Riesz, the brother of Frigyes Riesz (whose work we 
have seen in several chapters), proved that the assumption that the c;;, 
are nonnegative can be dropped provided that one assumes that the re- 
ciprocal pairs (1/s1,1/t,) and (1/s9,1/tg) are from the “clear” upper 
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triangle of Figure 9.3. M. Riesz’s proof used only elementary methods, 
but it was undeniably subtle. It was also unsettling that Riesz’s argu- 
ment did not apply to the whole rectangle, but this was inevitable. Easy 
examples show that the interpolation bound (9.41) can fail for reciprocal 
pairs from the “gray” lower half of the unit square S. 


Some years after M. Riesz proved his interpolation theorem, Riesz’s 
student G.O. Thorin made a remarkable breakthrough by proving that 
the interpolation bound is valid for the whole square S under one im- 
portant proviso: it is essential to consider the complex normed linear 
spaces €? in lieu of the real £? spaces. 


Thorin’s key insight was to draw a link between the interpolation 
problem and the maximum modulus theorem from the theory of ana- 
lytic functions. Over the years, this link has become one of the most 
robust tools in the theory of inequalities, and it has been exploited in 
hundreds of papers. Bennett and Sharpley (1988, pp. 185-216) pro- 
vide an instructive discussion of the arguments of Riesz and Thorin in 
a contemporary setting. 


CHAPTER 10: HILBERT’S INEQUALITY 


Hilbert’s inequality has a direct connection to the eigenvalues of a 
special integral equations which de Bruijn and Wilf (1961) used to show 
that for an n by n array one can replace the 7 in Hilbert’s inequality 
with the smaller value A, = 7 — 7°/{2(log n)?} + O(log log n/ log n)?). 
The finite sections of many inequalities are addressed systematically by 
Wilf (1970). 

Mingzhe and Bichen (1998) show that the Euler—Maclaurin expansions 
can be used to obtain instructive refinements of the estimates on page 
158. Such refinements are almost always a possibility when integrals are 
used to estimate sums, but there can be many devils in the details. 


The notion of “stressing” an inequality is motivated by the discussion 
of Hardy, Littlewood, and Polya (1952, pp. 232-233). The method works 
so often that its failures are more surprising than its successes. 


Chung, Hajela, and Seymour (1988) exploit the inequality (10.22) in 
the analysis of self-organizing lists, a topic of importance in theoretical 
computer science. Exercise 10.6 elaborates on an argument which is 
given quite succinctly in Hardy (1936). Maligranda and Person (1993) 
note that Carlson suggested in his original paper that the bound (10.24) 
could not be derived from Holder’s inequality (or Cauchy’s), yet Hardy 
was quick to find a path. 
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CHAPTER 11: HARDyY’sS INEQUALITY AND THE FLOP 


In 1920 Hardy gave only an imperfect version of the discrete inequality 
(11.2), and his primary point at the time was to record the quantitative 
Hilbert’s inequality described in Exercise 11.5. Hardy raised but did not 
resolve the issue of the best constant, although Hardy gives a footnote 
citing a letter of Issai Schur which comes very close. 

Hardy (1920, p. 316) has another intriguing footnote which cites the 
inequality of Rogers (1888) and Holder (1889) in its pre-Riesz form 
(9.34). In this note, Hardy says “the well-known inequality...seems to 
be due to Holder.” In support of his statement, Hardy refers to Landau 
(1907), and this may be the critical point at which Rogers’s contribu- 
tion lapsed into obscurity. By the time Hardy, Littlewood, and Pélya 
wrote Inequalities, they had read Holder’s paper, and they knew that 
Holder did not claim the inequality as his own. Unfortunately, by the 
time Inequalities was to appear, it was Rogers who became a footnote. 

The argument given here for the inequality (11.1) is a modest sim- 
plification of the L? argument of Elliot (1926). The proof of the dis- 
crete Hardy inequality can be greatly shortened, especially (as Claude 
Dellacherie notes) if one appeals to ideas of Stieltjes integration. The 
volumes of B. Opic and A. Kufner (1990) and Grosse-Erdmann (1998) 
show how the problems discussed in this chapter have grown into a field. 


CHAPTER 12: SYMMETRIC SUMS 


The treatment of Newton’s inequalities follows the argument of Rosset 
(1989) which is elegantly developed in Niculescu (2000). Waterhouse 
(1983) discusses the symmetry questions which evolve from questions 
such as the one posed in Exercise 12.5. Symmetric polynomials are 
at the heart of many important results in algebra and analysis, so the 
literature is understandably enormous. Even the first few chapters of 
Macdonald (1995) reveal hundreds of identities. 


CHAPTER 13: SCHUR CONVEXITY AND MAJORIZATION 


The Schur criterion developed in Problem 13.1 relies mainly on the 
treatment of Olkin and Marshall (1979, pp. 54-58). 

The development of the HLP representation is a colloquial rendering 
of the proof given by Hardy, Littlewood, and Polya in Inequalities. 
CHAPTER 14: CANCELLATION AND AGGREGATION 


Exponential sums have a long rich history, but few would dispute that 
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the 1916 paper of Hermann Wey] created the estimation of exponential 
sums as a mathematical specialty. Weyl’s paper contained several sem- 
inal results, and, in particular, it pioneered what is now called Weyl’s 
method, where one applies the bound (14.10) recursively to estimate the 
exponential sum associated with a general polynomial. 

The discussion of the quadratic bound (14.7) introduces some of the 
most basic ideas of Weyl’s method, but it can only hint at the delicacy of 
the general case. The inequality of van der Corput’s inequality (14.17) 
is more special, but van der Corput’s 1931 argument must be one of 
history’s finest examples of pure Cauchy—Schwarz artistry. 

Nowadays, the form (14.23) of the Rademacher—Menchoff inequality is 
quite standard, but it is not given so explicitly in the fundamental works 
of Rademacher (1922) and Menchoff (1923). Instead, this form seems 
to come to us from Kazmarz and Steinhaus. One finds the inequality in 
(essentially) its modern form as Lemma 534 in the 1951 second edition 
of their famous monograph of 1935, and searches have not yielded an 
earlier source. 
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